Analysis of methylated dna comprising methylation-sensitive or methylation-dependent restrictions

ABSTRACT

The present disclosure provides compositions and methods related to analyzing DNA, such as cell-free DNA. In some embodiments, the cell-free DNA is from a subject having or suspected of having cancer and/or the cell-free DNA includes DNA from cancer cells. In some embodiments, the DNA is partitioned into a first subsample and a second subsample, wherein the first subsample comprises DNA with a nucleotide modification (e.g., a cytosine modification) in a greater proportion than the second subsample, and the second subsample is contacted with a methylation-dependent nuclease.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority of U.S. ProvisionalPatent Application No. 63/086,000, filed Sep. 30, 2020, and U.S.Provisional Patent Application No. 63/105,183, filed Oct. 23, 2020, eachof which is incorporated by reference herein in its entirety for allpurposes.

FIELD OF THE INVENTION

The present disclosure provides compositions and methods related toanalyzing DNA, such as cell-free DNA. In some embodiments, the cell-freeDNA is from a subject having or suspected of having cancer and/or thecell-free DNA includes DNA from cancer cells. In some embodiments, theDNA is partitioned into a first subsample and a second subsample,wherein the first subsample comprises DNA with a nucleotide modification(e.g., a cytosine modification) in a greater proportion than the secondsubsample, and the second subsample is contacted with amethylation-dependent nuclease.

INTRODUCTION AND SUMMARY

Cancer is responsible for millions of deaths per year worldwide. Earlydetection of cancer may result in improved outcomes because early-stagecancer tends to be more susceptible to treatment.

Improperly controlled cell growth is a hallmark of cancer that generallyresults from an accumulation of genetic and epigenetic changes, such ascopy number variations (CNVs), single nucleotide variations (SNVs), genefusions, insertions and/or deletions (indels), epigenetic variationsincluding modification of cytosine (e.g., 5-methylcytosine,5-hydroxymethylcytosine, and other more oxidized forms) and associationof DNA with chromatin proteins and transcription factors.

Biopsies represent a traditional approach for detecting or diagnosingcancer in which cells or tissue are extracted from a possible site ofcancer and analyzed for relevant phenotypic and/or genotypic features.Biopsies have the drawback of being invasive.

Detection of cancer based on analysis of body fluids (“liquidbiopsies”), such as blood, is an intriguing alternative based on theobservation that DNA from cancer cells is released into body fluids. Aliquid biopsy is noninvasive (sometimes requiring only a blood draw).Current methods of cancer diagnostic assays of cell-free nucleic acids(e.g., cell-free DNA or cell-free RNA) may focus on the detection oftumor-related somatic variants, including single nucleotide variants(SNVs), copy number variations (CNVs), fusions, and indels (i.e.,insertions or deletions), which are all mainstream targets for liquidbiopsy. There is growing evidence that non-sequence modifications likemethylation status and fragmentomic signal in cell-free DNA can provideinformation on the source of cell-free DNA and disease level. Thenon-sequence modifications of the cell-free DNA, when combined withsomatic mutation calling, can yield a more comprehensive assessment oftumor status than that available from either approach alone. However, ithas been challenging to develop accurate and sensitive methods foranalyzing liquid biopsy material that provides detailed informationregarding nucleobase modifications given the low concentration andheterogeneity of cell-free DNA.

Isolating and processing the fractions of cell-free DNA useful forfurther analysis in liquid biopsy procedures is an important part ofthese methods. Accordingly, there is a need for improved methods andcompositions for analyzing cell-free DNA, e.g., in liquid biopsies.

The present disclosure aims to meet the need for improved analysis ofcell-free DNA and/or provide other benefits. Accordingly, the followingexemplary embodiments are provided.

Embodiment 1 is a method of analyzing DNA in a sample, the methodcomprising:

a) partitioning the sample into a plurality of subsamples, including afirst subsample and a second subsample, wherein the first subsamplecomprises DNA with a cytosine modification in a greater proportion thanthe second subsample;b) contacting the second subsample with a methylation-dependentnuclease, thereby degrading nonspecifically partitioned DNA in thesecond subsample to produce a treated second subsample and optionallycontacting the first subsample with a methylation-sensitiveendonuclease, thereby degrading nonspecifically partitioned DNA in thefirst subsample to produce a treated first subsample; andc) capturing a first target region set comprising epigenetic targetregions from at least a portion of the first subsample or the treatedfirst subsample, and capturing a second target region set comprisingepigenetic target regions from at least a portion of the treated secondsubsample.

Embodiment 2 is a method of analyzing DNA in a sample, the methodcomprising:

a) capturing a first target region set comprising epigenetic targetregions from the sample;b) partitioning the target region set into a plurality of subsamples,including a first subsample and a second subsample, wherein the firstsubsample comprises DNA with a cytosine modification in a greaterproportion than the second subsample; andc) contacting the second subsample with a methylation-dependentnuclease, thereby degrading nonspecifically partitioned DNA in thesecond subsample to produce a treated second subsample and optionallycontacting the first subsample with a methylation-sensitiveendonuclease, thereby degrading nonspecifically partitioned DNA in thefirst subsample to produce a treated first sub sample.

Embodiment 3 is the method of embodiment 1, further comprisingquantifying epigenetic target regions captured from or present in one ormore of the first subsample, the treated first subsample, or the treatedsecond subsample.

Embodiment 4 is the method of embodiment 2, wherein the quantifyingcomprises quantitative amplification, optionally wherein thequantitative amplification is quantitative PCR.

Embodiment 5 is the method of any one of the preceding embodiments,further comprising sequencing DNA in the first target region set and thesecond target region set or in the treated second subsample.

Embodiment 6 is the method of the immediately preceding embodiment,wherein DNA in the treated second subsample and DNA in the treated firstsubsample is sequenced.

Embodiment 7 is the method of any one of the preceding embodiments,wherein the epigenetic target regions comprise a hypomethylationvariable target region set.

Embodiment 8 is the method of the immediately preceding embodiment,wherein the hypomethylation variable target region set comprises regionshaving a lower degree of methylation in at least one type of tissue thanthe degree of methylation in cell-free DNA from a healthy subject.

Embodiment 9 is the method of the immediately preceding embodiment,wherein the method further comprises determining a presence, absence, orlikelihood of cancer based at least in part on sequences or quantitiesof regions in the hypomethylation variable target region set.

Embodiment 10 is the method of any one of embodiments 7-9, furthercomprising quantifying tumor DNA in the sample based at least in part onsequences or quantities of regions in the hypomethylation variabletarget region set.

Embodiment 11 is a method of analyzing DNA in a sample, the methodcomprising:

a) partitioning the sample into a plurality of subsamples, including afirst subsample and a second subsample, wherein the first subsamplecomprises DNA with a cytosine modification in a greater proportion thanthe second subsample;b) contacting the second subsample with a methylation-dependentnuclease, thereby degrading nonspecifically partitioned DNA in thesecond subsample to produce a treated second subsample and optionallycontacting the first subsample with a methylation-sensitiveendonuclease, thereby degrading nonspecifically partitioned DNA in thefirst subsample to produce a treated first subsample; andc) capturing a first target region set comprising epigenetic targetregions from at least a portion of the first subsample or the treatedfirst subsample.

Embodiment 12 is the method of embodiment 11, further comprisingquantifying epigenetic target regions captured from one or more of thefirst subsample or the treated first subsample, or present in thetreated second subsample.

Embodiment 13 is the method of embodiment 12, wherein the quantifyingcomprises quantitative amplification, optionally wherein thequantitative amplification is quantitative PCR.

Embodiment 14 is the method of any one of embodiments 11-13, furthercomprising sequencing DNA in the first target region set and DNA fromthe second subsample.

Embodiment 15 is the method of any one of the preceding embodiments,wherein the DNA comprises DNA obtained from a bodily fluid, optionallywherein the bodily fluid is plasma, urine, lymph, or spinal fluid.

Embodiment 16 is the method of any one of the preceding embodiments,wherein the DNA comprises cell-free DNA (cfDNA) obtained from a testsubject.

Embodiment 17 is the method of any one of the preceding embodiments,wherein the cytosine modification is methylation.

Embodiment 18 is the method of any one of the preceding embodiments,wherein the cytosine modification is methylation at the 5 position ofcytosine.

Embodiment 19 is the method of any one of the preceding embodiments,wherein the first subsample is contacted with a methylation-sensitiveendonuclease.

Embodiment 20 is the method of the immediately preceding embodiment,wherein the methylation-sensitive endonuclease cleaves an unmethylatedCpG sequence.

Embodiment 21 is the method of any one of the preceding embodiments,wherein the methylation-sensitive endonuclease is one or more of AatII,AccII, Acil, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI, Cfr10I, Clal,Cpol, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV, MluI, NaeI,NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI, SmaI, and SnaBI.

Embodiment 22 is the method of the immediately preceding embodiment,wherein the methylation-sensitive endonuclease is one or more of BstUI,HpaII, Hin6I, HhaI, or AccII, optionally wherein themethylation-sensitive endonuclease is (i) BstUI and HpaII; (ii) BstUI,HpaII, and Hin6I; or (iii) HhaI and AccII.

Embodiment 23 is the method of any one of the preceding embodiments,wherein the methylation-dependent endonuclease cleaves a methylated CpGsequence.

Embodiment 24 is the method of any one of the preceding embodiments,wherein the methylation-dependent endonuclease is one or more of MspJI,LpnPI, FspEI, or McrBC.

Embodiment 25 is the method of any one of the preceding embodiments,wherein the first target region set comprises a hypermethylationvariable target region set.

Embodiment 26 is the method of the immediately preceding embodiment,wherein the hypermethylation variable target region set comprisesregions having a higher degree of methylation in at least one type oftissue than the degree of methylation in cell-free DNA from a healthysubject.

Embodiment 27 is the method of embodiment 25 or 26, wherein the methodfurther comprises determining a presence, absence, or likelihood ofcancer based at least in part on sequences or quantities of regions inthe hypermethylation variable target region set.

Embodiment 28 is the method of any one of embodiments 25-27, furthercomprising quantifying tumor DNA in the sample based at least in part onsequences or quantities of regions in the hypomethylation variabletarget region set.

Embodiment 29 is the method of any one of the preceding embodiments,wherein the epigenetic target regions comprise a methylation controltarget region set.

Embodiment 30 is the method of any one of the preceding embodiments,wherein the first and/or second epigenetic target region set comprise afragmentation variable target region set.

Embodiment 31 is the method of the immediately preceding embodiment,wherein the fragmentation variable target region set comprisestranscription start site regions.

Embodiment 32 is the method of embodiment 30 or 31, wherein thefragmentation variable target region set comprises CTCF binding regions.

Embodiment 33 is the method of any one of the preceding embodiments,wherein the first target region set further comprises sequence-variabletarget regions.

Embodiment 34 is the method of any one of the preceding embodiments,wherein the second target region set further comprises sequence-variabletarget regions.

Embodiment 35 is the method of embodiment 33 or 34, wherein DNAmolecules corresponding to the sequence-variable target region set arecaptured with a greater capture yield than DNA molecules correspondingto the epigenetic target region set.

Embodiment 36 is the method of any one of the preceding embodiments,wherein capturing comprises contacting DNA to be captured with a set oftarget-specific probes, whereby complexes of target-specific probes andDNA are formed.

Embodiment 37 is the method of the immediately preceding embodiment,wherein capturing further comprises separating the complexes from DNAnot bound to target-specific probes, thereby providing captured DNA.

Embodiment 38 is the method of embodiment 36 or 37, wherein the set oftarget-specific probes is configured to capture DNA corresponding to thesequence-variable target region set with a greater capture yield thanDNA corresponding to the epigenetic target region set.

Embodiment 39 is the method of any one of embodiments 33-38, comprisingsequencing DNA molecules corresponding to the sequence-variable targetregion set to a greater depth of sequencing than DNA moleculescorresponding to the epigenetic target region set.

Embodiment 40 is the method of any one of the preceding embodiments,wherein the DNA is amplified before the sequencing step, or the DNA isamplified before the capturing step.

Embodiment 41 is the method of any one of the preceding embodiments,further comprising ligating adapters to the DNA before capture,optionally wherein the ligating occurs before or simultaneously withamplification.

Embodiment 42 is the method of the immediately preceding embodiment,wherein the adapters are barcode-containing adapters.

Embodiment 43 is the method of embodiment 41 or 42, wherein the adaptersare ligated to the DNA before contacting the second subsample with amethylation-dependent nuclease.

Embodiment 44 is the method of the immediately preceding embodiment,wherein the adapters are resistant to digestion by themethylation-dependent nuclease

Embodiment 45 is the method of the immediately preceding embodiment,wherein the adapters are unmethylated.

Embodiment 46 is the method of any one of the preceding embodiments,wherein partitioning the sample into a plurality of subsamples comprisespartitioning on the basis of methylation level.

Embodiment 47 is the method of the immediately preceding embodiment,wherein the partitioning step comprises contacting the collected cfDNAwith a methyl binding reagent immobilized on a solid support.

Embodiment 48 is the method of any one of the preceding embodiments,comprising differentially tagging the first subsample and secondsubsample, the treated first subsample and the second subsample, thefirst subsample and the treated second subsample, or the treated firstsubsample and the treated second subsample.

Embodiment 49 is the method of the immediately preceding embodiment,wherein the treated first subsample and the second subsample, the firstsubsample and the treated second subsample, or the treated firstsubsample and the treated second subsample are pooled after contactingthe first subsample with the methylation-sensitive endonuclease and/orcontacting the second subsample with a methylation-dependent nuclease.

Embodiment 50 is the method of any one of embodiments 47-49, wherein DNAfrom the treated first subsample and the second subsample, the firstsubsample and the treated second subsample, or the treated firstsubsample and the treated second subsample is sequenced in the samesequencing cell.

Embodiment 51 is the method of any one of the preceding embodiments,wherein the plurality of subsamples comprises a third subsample, whichcomprises DNA with a cytosine modification in a greater proportion thanthe second subsample but in a lesser proportion than the firstsubsample.

Embodiment 52 is the method of the immediately preceding embodiment,wherein the method further comprises differentially tagging the thirdsubsample.

Embodiment 53 is the method of the immediately preceding embodiment,wherein the first, second, and third subsamples are combined aftercontacting the first subsample with the methylation-sensitiveendonuclease and/or contacting the second subsample with amethylation-dependent nuclease, optionally wherein DNA from the first,second, and third subsamples is sequenced in the same sequencing cell.

Embodiment 54 is the method of any one of embodiments 51-53, wherein thethird subsample is contacted with a methylation-sensitive endonuclease.

Embodiment 55 is the method of any one of embodiments 52-53, wherein thethird subsample is combined with the first subsample, and the combinedfirst and third subsamples are contacted with a methylation-sensitiveendonuclease.

Embodiment 56 is the method of the immediately preceding embodiment,wherein combined first and third subsamples are further combined withthe second subsample after contacting the first and third subsampleswith the methylation-sensitive endonuclease and contacting the secondsubsample with a methylation-dependent nuclease, optionally wherein DNAfrom the first, second, and third subsamples is sequenced in the samesequencing cell.

Embodiment 57 is the method of any one of the preceding embodiments,wherein the methylation-dependent nuclease is heat-inactivated afterdegrading nonspecifically partitioned DNA.

Embodiment 58 is the method of any one of the preceding embodiments,wherein the first subsample is subjected to a procedure that affects afirst nucleobase in the DNA differently from a second nucleobase in theDNA of the first subsample, wherein the first nucleobase is a modifiedor unmodified nucleobase, the second nucleobase is a modified orunmodified nucleobase different from the first nucleobase, and the firstnucleobase and the second nucleobase have the same base pairingspecificity.

Embodiment 59 is the method of the immediately preceding embodiment,wherein the procedure to which the first subsample is subjected altersbase-pairing specificity of the first nucleobase without substantiallyaltering base-pairing specificity of the second nucleobase.

Embodiment 60 is the method of embodiment 58 or 59, wherein the firstnucleobase is a modified or unmodified cytosine and the secondnucleobase is a modified or unmodified cytosine.

Embodiment 61 is the method of any one of embodiments 58-60, wherein thefirst nucleobase comprises unmodified cytosine (C).

Embodiment 62 is the method of any one of embodiments 58-61, wherein thesecond nucleobase comprises 5-methylcytosine (mC).

Embodiment 63 is the method of any one of embodiments 58-62, wherein theprocedure to which the first subsample is subjected comprises bisulfiteconversion.

Embodiment 64 is the method of any one of embodiments 58-60, wherein thefirst nucleobase comprises mC.

Embodiment 65 is the method of any one of embodiments 58-62, wherein thesecond nucleobase comprises 5-hydroxymethylcytosine (hmC).

Embodiment 66 is the method of embodiment 62, wherein the procedure towhich the first subsample is subjected comprises protection of 5hmC.

Embodiment 67 is the method of embodiment 65, wherein the procedure towhich the first subsample is subjected comprises Tet-assisted bisulfiteconversion.

Embodiment 68 is the method of embodiment 65, wherein the procedure towhich the first subsample is subjected comprises Tet-assisted conversionwith a substituted borane reducing agent, optionally wherein thesubstituted borane reducing agent is 2-picoline borane, borane pyridine,tert-butylamine borane, or ammonia borane.

Embodiment 69 is the method of embodiment 68, wherein the substitutedborane reducing agent is 2-picoline borane or borane pyridine.

Embodiment 70 is the method of any one of embodiments 58-60, 64-66, or68-69, wherein the second nucleobase comprises C.

Embodiment 71 is the method of any one of embodiments 64-66 or 70,wherein the procedure to which the first subsample is subjectedcomprises protection of hmC followed by Tet-assisted conversion with asubstituted borane reducing agent, optionally wherein the substitutedborane reducing agent is 2-picoline borane, borane pyridine,tert-butylamine borane, or ammonia borane.

Embodiment 72 is the method of embodiment 71, wherein the substitutedborane reducing agent is 2-picoline borane or borane pyridine.

Embodiment 73 is the method of any one of embodiments 61, 62, 64-66, or70, wherein the procedure to which the first subsample is subjectedcomprises protection of hmC followed by deamination of mC and/or C.

Embodiment 74 is the method of embodiment 73, wherein the deamination ofmC and/or C comprises treatment with an AID/APOBEC family DNA deaminaseenzyme.

Embodiment 75 is the method of any one of embodiments 66 or 70-74,wherein protection of hmC comprises glucosylation of hmC.

Embodiment 76 is the method of any one of embodiments 58-60, 62, 64, or70, wherein the procedure to which the first subsample is subjectedcomprises chemical-assisted conversion with a substituted boranereducing agent, optionally wherein the substituted borane reducing agentis 2-picoline borane, borane pyridine, tert-butylamine borane, orammonia borane.

Embodiment 77 is the method of embodiment 76, wherein the substitutedborane reducing agent is 2-picoline borane or borane pyridine.

Embodiment 78 is the method of any one of embodiments 58-60, 62, 64, 70,or 76-77, wherein the first nucleobase comprises hmC.

Embodiment 79 is the method of any one of the preceding embodiments,wherein the DNA of the first subsample and the DNA of the secondsubsample are differentially tagged; after differential tagging, aportion of DNA from the second subsample is added to the first subsampleor treated first subsample or at least a portion thereof, therebyforming a pool; and sequence-variable target regions and epigenetictarget regions are captured from the pool.

Embodiment 80 is the method of the immediately preceding embodiment,wherein the pool comprises less than or equal to about 45%, 40%, 35%,30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the second subsample.

Embodiment 81 is the method of the immediately preceding embodiment,wherein the pool comprises about 70-90%, about 75-85%, or about 80% ofthe DNA of the second sub sample.

Embodiment 82 is the method of any one of embodiments 79-81, wherein thepool comprises substantially all of the DNA of the first subsample.

Embodiment 83 is the method of any one of embodiments 79-82, wherein thepool comprises substantially all of the DNA of the first subsample ortreated first subsample.

Embodiment 84 is the method of any one of embodiments 79-83, wherein thefirst target region set is captured from at least a portion of the firstsubsample or treated first subsample after formation of the pool.

Embodiment 85 is the method of any one of the preceding embodiments,further comprising determining a likelihood that the subject has cancer.

Embodiment 86 is the method of the immediately preceding embodiment,wherein the sequencing generates a plurality of sequencing reads; andthe method further comprises mapping the plurality of sequence reads toone or more reference sequences to generate mapped sequence reads, andprocessing the mapped sequence reads corresponding to thesequence-variable target region set and to the epigenetic target regionset to determine the likelihood that the subject has cancer.

Embodiment 87 is the method of any one of embodiments 1-85, wherein thetest subject was previously diagnosed with a cancer and received one ormore previous cancer treatments, optionally wherein the cfDNA isobtained at one or more preselected time points following the one ormore previous cancer treatments, and sequencing the captured set ofcfDNA molecules, whereby a set of sequence information is produced.

Embodiment 88 is the method of the immediately preceding embodiment,further comprising detecting a presence or absence of DNA originating orderived from a tumor cell at a preselected timepoint using the set ofsequence information.

Embodiment 89 is the method of the immediately preceding embodiment,further comprising determining a cancer recurrence score that isindicative of the presence or absence of the DNA originating or derivedfrom the tumor cell for the test subject, optionally further comprisingdetermining a cancer recurrence status based on the cancer recurrencescore, wherein the cancer recurrence status of the test subject isdetermined to be at risk for cancer recurrence when a cancer recurrencescore is determined to be at or above a predetermined threshold or thecancer recurrence status of the test subject is determined to be atlower risk for cancer recurrence when the cancer recurrence score isbelow the predetermined threshold.

Embodiment 90 is the method of the immediately preceding embodiment,further comprising comparing the cancer recurrence score of the testsubject with a predetermined cancer recurrence threshold, wherein thetest subject is classified as a candidate for a subsequent cancertreatment when the cancer recurrence score is above the cancerrecurrence threshold or not a candidate for a subsequent cancertreatment when the cancer recurrence score is below the cancerrecurrence threshold.

I. BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a schematic diagram showing a methylation dependent nuclease(e.g., methylation dependent restriction enzyme (MDRE))digesting/cleaving the DNA where the restriction enzyme (RE) recognitionsite contains a methylated nucleotide but not cleaving the DNA where therestriction enzyme (RE) recognition site contains an unmethylatednucleotide. FIG. 1B is a schematic diagram of a methylation sensitivenuclease (e.g., methylation sensitive restriction enzyme (MSRE))digesting/cleaving the DNA where the restriction enzyme (RE) recognitionsite contains an unmethylated nucleotide but not cleaving the DNA wherethe restriction enzyme (RE) recognition site contains a methylatednucleotide.

FIG. 2 is a flow chart representation of a method for determining themethylation status of nucleic acid molecules in a polynucleotides sampleobtained from a subject according to an embodiment of the disclosure.

FIG. 3 is a flow chart representation of a method for determining themethylation status of nucleic acid molecules in a polynucleotides sampleobtained from a subject according to an embodiment of the disclosure.

FIG. 4 is a schematic diagram of a method for detecting the presence orabsence of cancer in a subject according to certain embodiments of thedisclosure.

FIG. 5 schematic diagram of an example of a system suitable for use withsome embodiments of the disclosure.

FIG. 6 shows the molecule count in the three partitions with and withoutMSRE treatments in normal and diluted CRC samples.

FIG. 7 shows CpG methylation quantification results obtained asdescribed in Example 2 for three samples from subjects with early stagecolorectal cancer (“Early CRC”) and three healthy subjects (“Normal”).For the Early CRC plots, MAF indicates mutant allele fraction.

FIGS. 8A-D show counts of positive and negative control molecules havingFspEI palindromic sites for the indicated enzyme and buffer conditions,as described in Example 4. FIGS. 8A and 8C correspond to a first donorand FIGS. 8B and 8D correspond to a second donor. Data points aredistributed along the horizontal axis for readability.

FIGS. 9A-D show digestion efficiency and positive control moleculecounts as described in Example 4.

FIGS. 10A-J show hypomethylation variable target region (“Hypo VTR”)molecule counts (10A-E) or Hypo VTR/negative control molecule ratios(10F-J) for the indicated conditions as described in Example 5. Datapoints are distributed along the horizontal axis for readability.Triangles, circles, plus signs, and squares indicate that the source ofthe normal cfDNA was the first, second, third, or fourth of four healthydonors, respectively.

II. DETAILED DESCRIPTION OF CERTAIN EMBODIMENTS

Reference will now be made in detail to certain embodiments of theinvention. While the invention will be described in conjunction withsuch embodiments, it will be understood that they are not intended tolimit the invention to those embodiments. On the contrary, the inventionis intended to cover all alternatives, modifications, and equivalents,which may be included within the invention as defined by the appendedclaims.

Before describing the present teachings in detail, it is to beunderstood that the disclosure is not limited to specific compositionsor process steps, as such may vary. It should be noted that, as used inthis specification and the appended claims, the singular form “a”, “an”and “the” include plural references unless the context clearly dictatesotherwise. Thus, for example, reference to “a nucleic acid” includes aplurality of nucleic acids, reference to “a cell” includes a pluralityof cells, and the like.

Numeric ranges are inclusive of the numbers defining the range. Measuredand measurable values are understood to be approximate, taking intoaccount significant digits and the error associated with themeasurement. Also, the use of “comprise”, “comprises”, “comprising”,“contain”, “contains”, “containing”, “include”, “includes”, and“including” are not intended to be limiting. It is to be understood thatboth the foregoing general description and detailed description areexemplary and explanatory only and are not restrictive of the teachings.

Unless specifically noted in the above specification, embodiments in thespecification that recite “comprising” various components are alsocontemplated as “consisting of” or “consisting essentially of” therecited components; embodiments in the specification that recite“consisting of” various components are also contemplated as “comprising”or “consisting essentially of” the recited components; and embodimentsin the specification that recite “consisting essentially of” variouscomponents are also contemplated as “consisting of” or “comprising” therecited components (this interchangeability does not apply to the use ofthese terms in the claims).

The section headings used herein are for organizational purposes and arenot to be construed as limiting the disclosed subject matter in any way.In the event that any document or other material incorporated byreference contradicts any explicit content of this specification,including definitions, this specification controls.

A. Definitions

“Cell-free DNA,” “cfDNA molecules,” or simply “cfDNA” include DNAmolecules that naturally occur in a subject in extracellular form (e.g.,in blood, serum, plasma, or other bodily fluids such as lymph,cerebrospinal fluid, urine, or sputum). While the cfDNA originallyexisted in a cell or cells in a large complex biological organism, e.g.,a mammal, it has undergone release from the cell(s) into a fluid foundin the organism, and may be obtained from a sample of the fluid withoutthe need to perform an in vitro cell lysis step.

As used herein, “cellular nucleic acids” means nucleic acids that aredisposed within one or more cells from which the nucleic acids haveoriginated, at least at the point a sample is taken or collected from asubject, even if those nucleic acids are subsequently removed (e.g., viacell lysis) as part of a given analytical process.

As used herein, a modification or other feature is present in “a greaterproportion” in a first sample or population of nucleic acid than in asecond sample or population when the fraction of nucleotides with themodification or other feature is higher in the first sample orpopulation than in the second population. For example, if in a firstsample, one tenth of the nucleotides are mC, and in a second sample, onetwentieth of the nucleotides are mC, then the first sample comprises thecytosine modification of 5-methylation in a greater proportion than thesecond sample.

As used herein, “without substantially altering base-pairingspecificity” of a given nucleobase means that a majority of moleculescomprising that nucleobase that can be sequenced do not have alterationsof the base pairing specificity of the second nucleobase relative to itsbase pairing specificity as it was in the originally isolated sample. Insome embodiments, 75%, 90%, 95%, or 99% of molecules comprising thatnucleobase that can be sequenced do not have alterations of the basepairing specificity of the second nucleobase relative to its basepairing specificity as it was in the originally isolated sample.

As used herein, “base pairing specificity” refers to the standard DNAbase (A, C, G, or T) for which a given base most preferentially pairs.Thus, for example, unmodified cytosine and 5-methylcytosine have thesame base pairing specificity (i.e., specificity for G) whereas uraciland cytosine have different base pairing specificity because uracil hasbase pairing specificity for A while cytosine has base pairingspecificity for G. The ability of uracil to form a wobble pair with G isirrelevant because uracil nonetheless most preferentially pairs with Aamong the four standard DNA bases.

As used herein, a “combination” comprising a plurality of members refersto either of a single composition comprising the members or a set ofcompositions in proximity, e.g., in separate containers or compartmentswithin a larger container, such as a multiwell plate, tube rack,refrigerator, freezer, incubator, water bath, ice bucket, machine, orother form of storage.

The “capture yield” of a collection of probes for a given target setrefers to the amount (e.g., amount relative to another target set or anabsolute amount) of nucleic acid corresponding to the target set thatthe collection of probes captures under typical conditions. Exemplarytypical capture conditions are an incubation of the sample nucleic acidand probes at 65° C. for 10-18 hours in a small reaction volume (about20 μL) containing stringent hybridization buffer. The capture yield maybe expressed in absolute terms or, for a plurality of collections ofprobes, relative terms. When capture yields for a plurality of sets oftarget regions are compared, they are normalized for the footprint sizeof the target region set (e.g., on a per-kilobase basis). Thus, forexample, if the footprint sizes of first and second target regions are50 kb and 500 kb, respectively (giving a normalization factor of 0.1),then the DNA corresponding to the first target region set is capturedwith a higher yield than DNA corresponding to the second target regionset when the mass per volume concentration of the captured DNAcorresponding to the first target region set is more than 0.1 times themass per volume concentration of the captured DNA corresponding to thesecond target region set. As a further example, using the same footprintsizes, if the captured DNA corresponding to the first target region sethas a mass per volume concentration of 0.2 times the mass per volumeconcentration of the captured DNA corresponding to the second targetregion set, then the DNA corresponding to the first target region setwas captured with a two-fold greater capture yield than the DNAcorresponding to the second target region set.

“Capturing” one or more target nucleic acids refers to preferentiallyisolating or separating the one or more target nucleic acids fromnon-target nucleic acids.

A “captured set” of nucleic acids refers to nucleic acids that haveundergone capture.

A “target-region set” or “set of target regions” refers to a pluralityof genomic loci targeted for capture and/or targeted by a set of probes(e.g., through sequence complementarity).

“Corresponding to a target region set” means that a nucleic acid, suchas cfDNA, originated from a locus in the target region set orspecifically binds one or more probes for the target-region set.

“Specifically binds” in the context of an probe or other oligonucleotideand a target sequence means that under appropriate hybridizationconditions, the oligonucleotide or probe hybridizes to its targetsequence, or replicates thereof, to form a stable probe:target hybrid,while at the same time formation of stable probe:non-target hybrids isminimized. Thus, a probe hybridizes to a target sequence or replicatethereof to a sufficiently greater extent than to a non-target sequence,to enable capture or detection of the target sequence. Appropriatehybridization conditions are well-known in the art, may be predictedbased on sequence composition, or can be determined by using routinetesting methods (see, e.g., Sambrook et al., Molecular Cloning, ALaboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y., 1989) at §§ 1.90-1.91, 7.37-7.57, 9.47-9.51 and11.47-11.57, particularly §§ 9.50-9.51, 11.12-11.13, 11.45-11.47 and11.55-11.57, incorporated by reference herein).

“Sequence-variable target region set” refers to a set of target regionsthat may exhibit changes in sequence such as nucleotide substitutions(i.e., single nucleotide variations), insertions, deletions, or genefusions or transpositions in neoplastic cells (e.g., tumor cells andcancer cells).

“Epigenetic target region set” refers to a set of target regions thatmay show sequence-independent changes in neoplastic cells (e.g., tumorcells and cancer cells) or that may show sequence-independent changes incfDNA from subjects having cancer relative to cfDNA from healthysubjects. Examples of sequence-independent changes include, but notlimited to, changes in methylation (increases or decreases), nucleosomedistribution, CTCF binding, transcription start sites, and regulatoryprotein binding regions. For present purposes, loci susceptible toneoplasia-, tumor-, or cancer-associated focal amplifications and/orgene fusions may also be included in an epigenetic target region setbecause detection of a change in copy number by sequencing or a fusedsequence that maps to more than one locus in a reference genome tends tobe more similar to detection of exemplary epigenetic changes discussedabove than detection of nucleotide substitutions, insertions, ordeletions, e.g., in that the focal amplifications and/or gene fusionscan be detected at a relatively shallow depth of sequencing becausetheir detection does not depend on the accuracy of base calls at one ora few individual positions.

A nucleic acid is “produced by a tumor” or ctDNA or circulating tumorDNA, if it originated from a tumor cell. Tumor cells are neoplasticcells that originated from a tumor, regardless of whether they remain inthe tumor or become separated from the tumor (as in the cases, e.g., ofmetastatic cancer cells and circulating tumor cells).

The term “methylation” or “DNA methylation” refers to addition of amethyl group to a nucleotide base in a nucleic acid molecule. In someembodiments, methylation refers to addition of a methyl group to acytosine at a CpG site (cytosine-phosphate-guanine site (i.e., acytosine followed by a guanine in a 5′→3′ direction of the nucleic acidsequence). In some embodiments, DNA methylation refers to addition of amethyl group to adenine, such as in N⁶-methyladenine. In someembodiments, DNA methylation is 5-methylation (modification of the 5thcarbon of the 6-carbon ring of cytosine). In some embodiments,5-methylation refers to addition of a methyl group to the 5C position ofthe cytosine to create 5-methylcytosine (5mC). In some embodiments,methylation comprises a derivative of 5mC. Derivatives of 5mC include,but are not limited to, 5-hydroxymethylcytosine (5-hmC),5-formylcytosine (5-fC), and 5-caryboxylcytosine (5-caC). In someembodiments, DNA methylation is 3C methylation (modification of the 3rdcarbon of the 6-carbon ring of cytosine). In some embodiments, 3Cmethylation comprises addition of a methyl group to the 3C position ofthe cytosine to generate 3-methylcytosine (3mC). Methylation can alsooccur at non CpG sites, for example, methylation can occur at a CpA,CpT, or CpC site. DNA methylation can change the activity of methylatedDNA region. For example, when DNA in a promoter region is methylated,transcription of the gene may be repressed. DNA methylation is criticalfor normal development and abnormality in methylation may disruptepigenetic regulation. The disruption, e.g., repression, in epigeneticregulation may cause diseases, such as cancer. Promoter methylation inDNA may be indicative of cancer

The term “hypermethylation” refers to an increased level or degree ofmethylation of nucleic acid molecule(s) relative to the other nucleicacid molecules within a population (e.g., sample) of nucleic acidmolecules. In some embodiments, hypermethylated DNA can include DNAmolecules comprising at least 1 methylated residue, at least 2methylated residues, at least 3 methylated residues, at least 5methylated residues, or at least 10 methylated residues.

The term “hypomethylation” refers to a decreased level or degree ofmethylation of nucleic acid molecule(s) relative to the other nucleicacid molecules within a population (e.g., sample) of nucleic acidmolecules. In some embodiments, hypomethylated DNA includes unmethylatedDNA molecules. In some embodiments, hypomethylated DNA can include DNAmolecules comprising 0 methylated residues, at most 1 methylatedresidue, at most 2 methylated residues, at most 3 methylated residues,at most 4 methylated residues, or at most 5 methylated residues.

The term “methylation-dependent nuclease” refers to a nuclease thatpreferentially cuts methylated DNA relative to unmethylated DNA. Forexample, a methylation-dependent nuclease may cut at or near arecognition sequence such as a restriction site in a manner dependent onmethylation of at least one of the nucleobases in the recognitionsequence, such as a cytosine. In some embodiments, the nucleolyticactivity of the methylation-dependent nuclease is at least 10, 20, 50,or 100-fold higher on a methylated recognition site relative to anunmethylated control in a standard nucleolysis assay.Methylation-dependent nucleases include methylation-dependentrestriction enzymes.

As used herein, “methylation-dependent restriction enzyme” or “MDRE”refers to a restriction enzyme that is dependent on methylation of theDNA (e.g. cytosine methylation) i.e., the presence or absence of methylgroup in a nucleotide base alters the rate at which the enzyme cleavesthe target DNA. In some embodiments, the methylation dependentrestriction enzymes do not cleave the DNA if a particular nucleotidebase is unmethylated at the recognition sequence. For example, MspJI isa methylation dependent restriction enzyme with a recognition sequence“mCNNR(N9)” and it does not cleave DNA if the absence of the methylatedcytosine (mC) in the recognition sequence.

The term “methylation-sensitive nuclease” refers to a nuclease thatpreferentially cuts unmethylated DNA relative to methylated DNA. Forexample, a methylation-sensitive nuclease may cut at or near arecognition sequence such as a restriction site in a manner dependent onlack of methylation of at least one of the nucleobases in therecognition sequence, such as a cytosine. In some embodiments, thenucleolytic activity of the methylation-sensitive nuclease is at least10, 20, 50, or 100-fold higher on an unmethylated recognition siterelative to a methylated control in a standard nucleolysis assay.Methylation-sensitive nucleases include methylation-sensitiverestriction enzymes.

As used herein, “methylation sensitive restriction enzyme” or “MSRE”refers to a restriction enzyme that is sensitive to the methylationstatus of the DNA (e.g. cytosine methylation) i.e., the presence orabsence of methyl group in a nucleotide base alters the rate at whichthe enzyme cleaves the target DNA. In some embodiments, the methylationsensitive restriction enzymes do not cleave the DNA if a particularnucleotide base is methylated at the recognition sequence. For example,HpaII is a methylation sensitive restriction enzyme with a recognitionsequence “CCGG” and it does not cleave DNA if the second cytosine in therecognition sequence is methylated.

As used herein, “digestion efficiency” or “cutting efficiency” refers tothe efficiency of restriction enzyme digestion. The digestion efficiencycan be calculated based on the number of control molecules observed upondigesting with restriction enzyme and number of control moleculesobserved in the absence of restriction enzyme digestion. The MSREdigestion efficiency can be calculated by: Efficiency=1−(number ofnegative control molecules_([MSRE])/number of negative controlmolecules_([Mock])). The MDRE digestion efficiency can be calculated by:Efficiency=1−(number of positive control molecules_([MDRE])/number ofpositive control molecules_([Mock1])).

As used herein, “methylation status” can refer to the presence orabsence of methyl group on a DNA base (e.g. cytosine) at a particulargenomic position in a nucleic acid molecule. It can also refer to thedegree of methylation in a nucleic acid sequence (e.g., highlymethylated, low methylated, intermediately methylated or unmethylatednucleic acid molecules). The methylation status can also refer to thenumber of nucleotides methylated in a particular nucleic acid molecule.

As used herein, “mutation” refers to a variation from a known referencesequence and includes mutations such as, for example, single nucleotidevariants (SNVs), and insertions or deletions (indels). A mutation can bea germline or somatic mutation. In some embodiments, a referencesequence for purposes of comparison is a wildtype genomic sequence ofthe species of the subject providing a test sample, typically the humangenome.

As used herein, the terms “neoplasm” and “tumor” are usedinterchangeably. They refer to abnormal growth of cells in a subject. Aneoplasm or tumor can be benign, potentially malignant, or malignant. Amalignant tumor is a referred to as a cancer or a cancerous tumor.

As used herein, “next-generation sequencing” or “NGS” refers tosequencing technologies having increased throughput as compared totraditional Sanger- and capillary electrophoresis-based approaches, forexample, with the ability to generate hundreds of thousands ofrelatively small sequence reads at a time. Some examples ofnext-generation sequencing techniques include, but are not limited to,sequencing by synthesis, sequencing by ligation, and sequencing byhybridization. In some embodiments, next-generation sequencing includesthe use of instruments capable of sequencing single molecules. Exampleof commercially available instruments for performing next-generationsequencing include, but are not limited to, NextSeq, HiSeq, NovaSeq,MiSeq, Ion PGM and Ion GeneStudio S5.

As used herein, “nucleic acid tag” refers to a short nucleic acid (e.g.,less than about 500 nucleotides, about 100 nucleotides, about 50nucleotides, or about 10 nucleotides in length), used to distinguishnucleic acids from different samples (e.g., representing a sampleindex), distinguish nucleic acids from different partitions (e.g.,representing a partition tag) or different nucleic acid molecules in thesame sample (e.g., representing a molecular barcode), of differenttypes, or which have undergone different processing. The nucleic acidtag comprises a predetermined, fixed, non-random, random or semi-randomoligonucleotide sequence. Such nucleic acid tags may be used to labeldifferent nucleic acid molecules or different nucleic acid samples orsub-samples. Nucleic acid tags can be single-stranded, double-stranded,or at least partially double-stranded. Nucleic acid tags optionally havethe same length or varied lengths. Nucleic acid tags can also includedouble-stranded molecules having one or more blunt-ends, include 5′ or3′ single-stranded regions (e.g., an overhang), and/or include one ormore other single-stranded regions at other locations within a givenmolecule. Nucleic acid tags can be attached to one end or to both endsof the other nucleic acids (e.g., sample nucleic acids to be amplifiedand/or sequenced). Nucleic acid tags can be decoded to revealinformation such as the sample of origin, form, or processing of a givennucleic acid. For example, nucleic acid tags can also be used to enablepooling and/or parallel processing of multiple samples comprisingnucleic acids bearing different molecular barcodes and/or sample indexesin which the nucleic acids are subsequently being deconvolved bydetecting (e.g., reading) the nucleic acid tags. Nucleic acid tags canalso be referred to as identifiers (e.g. molecular identifier, sampleidentifier). Additionally, or alternatively, nucleic acid tags can beused as molecular identifiers (e.g., to distinguish between differentmolecules or amplicons of different parent molecules in the same sampleor sub-sample). This includes, for example, uniquely tagging differentnucleic acid molecules in a given sample, or non-uniquely tagging suchmolecules. In the case of non-unique tagging applications, a limitednumber of tags (i.e., molecular barcodes) may be used to tag eachnucleic acid molecule such that different molecules can be distinguishedbased on their endogenous sequence information (for example, startand/or stop positions where they map to a selected reference genome, asub-sequence of one or both ends of a sequence, and/or length of asequence) in combination with at least one molecular barcode. Typically,a sufficient number of different molecular barcodes are used such thatthere is a low probability (e.g., less than about a 10%, less than abouta 5%, less than about a 1%, or less than about a 0.1% chance) that anytwo molecules may have the same endogenous sequence information (e.g.,start and/or stop positions, subsequences of one or both ends of asequence, and/or lengths) and also have the same molecular barcode.

As used herein, “partitioning” refers to physically separating orfractionating a mixture of nucleic acid molecules in a sample based on acharacteristic of the nucleic acid molecules. The partitioning can bephysical partitioning of molecules. Partitioning can involve separatingthe nucleic acid molecules into groups or sets based on the level ofepigenetic feature (for e.g., methylation). For example, the nucleicacid molecules can be partitioned based on the level of methylation ofthe nucleic acid molecules. In some embodiments, the methods and systemsused for partitioning may be found in PCT Patent Application No.PCT/US2017/068329, which is hereby incorporated by reference in itsentirety.

As used herein, “partitioned set” or “partition” refers to a set ofnucleic acid molecules partitioned into a set or group based on thedifferential binding affinity of the nucleic acid molecules or proteinsassociated with the nucleic acid molecules to a binding agent. Apartitioned set may also be referred to as a subsample. The bindingagent binds preferentially to the nucleic acid molecules comprisingnucleotides with epigenetic modification. For example, if the epigeneticmodification is methylation, the binding agent can be a methyl bindingdomain (MBD) protein. In some embodiments, a partitioned set cancomprise nucleic acid molecules belonging to a particular level ordegree of epigenetic feature (for e.g., methylation). For example, thenucleic acid molecules can be partitioned into three sets—one set forhighly methylated nucleic acid molecules (first subsample, hyperpartition, hyper partitioned set or hypermethylated partitioned set), asecond set for low methylated nucleic acid molecules (second subsample,hypo partition, hypo partitioned set or hypomethylated partitioned set),and a third set for intermediate methylated nucleic acid molecules(third subsample, intermediate partitioned set, intermediatelymethylated partitioned set, residual partition, or residual partitionedset). In another example, the nucleic acid molecules can be partitionedbased on the number of methylated nucleotides—one partitioned set canhave nucleic acid molecules with nine methylated nucleotides, andanother partitioned set can have unmethylated nucleic acid molecules(zero methylated nucleotides).

As used herein, “polynucleotide”, “nucleic acid”, “nucleic acidmolecule”, or “oligonucleotide” refers to a linear polymer ofnucleosides (including deoxyribonucleosides, ribonucleosides, or analogsthereof) joined by inter-nucleosidic linkages. Typically, apolynucleotide comprises at least three nucleosides. Oligonucleotidesoften range in size from a few monomeric units, e.g., 3-4, to hundredsof monomeric units. Whenever a polynucleotide is represented by asequence of letters, such as “ATGCCTG”, the nucleotides are in 5′→3′order from left to right, and in the case of DNA, “A” denotesdeoxyadenosine, “C” denotes deoxycytidine, “G” denotes deoxyguanosine,and “T” denotes deoxythymidine, unless otherwise noted. The letters A,C, G, and T may be used to refer to the bases themselves, tonucleosides, or to nucleotides comprising the bases.

As used herein, “processing” refers to a set of steps used to generate alibrary of nucleic acids that is suitable for sequencing. The set ofsteps can include, but are not limited to, partitioning, end repairing,addition of sequencing adapters, tagging, and/or PCR amplification ofnucleic acids.

As used herein, “quantitative measure” refers to an absolute or relativemeasure. A quantitative measure can be, without limitation, a number, astatistical measurement (e.g., frequency, mean, median, standarddeviation, or quantile), or a degree or a relative quantity (e.g., high,medium, and low). A quantitative measure can be a ratio of twoquantitative measures. A quantitative measure can be a linearcombination of quantitative measures. A quantitative measure may be anormalized measure.

As used herein, “reference sequence” refers to a known sequence used forpurposes of comparison with experimentally determined sequences. Forexample, a known sequence can be an entire genome, a chromosome, or anysegment thereof. A reference sequence can align with a single contiguoussequence of a genome or chromosome or chromosome arm or can includenon-contiguous segments that align with different regions of a genome orchromosome. Examples of reference sequences include, for example, humangenomes, such as, hg19 and hg38.

As used herein, “restriction enzyme” is an enzyme that recognizes andcleaves the DNA at or near a specific recognition site.

As used herein, “sample” means anything capable of being analyzed by themethods and/or systems disclosed herein.

As used herein, “sequencing” refers to any of a number of technologiesused to determine the sequence (e.g., the identity and order of monomerunits) of a biomolecule, e.g., a nucleic acid such as DNA or RNA.Examples of sequencing methods include, but are not limited to, targetedsequencing, single molecule real-time sequencing, exon or exomesequencing, intron sequencing, electron microscopy-based sequencing,panel sequencing, transistor-mediated sequencing, direct sequencing,random shotgun sequencing, Sanger dideoxy termination sequencing,whole-genome sequencing, sequencing by hybridization, pyrosequencing,duplex sequencing, cycle sequencing, single-base extension sequencing,solid-phase sequencing, high-throughput sequencing, massively parallelsignature sequencing, emulsion PCR, co-amplification at lowerdenaturation temperature-PCR (COLD-PCR), multiplex PCR, sequencing byreversible dye terminator, paired-end sequencing, near-term sequencing,exonuclease sequencing, sequencing by ligation, short-read sequencing,single-molecule sequencing, sequencing-by-synthesis, real-timesequencing, reverse-terminator sequencing, nanopore sequencing, 454sequencing, Solexa Genome Analyzer sequencing, SOLiD™ sequencing, MS-PETsequencing, and a combination thereof. In some embodiments, sequencingcan be performed by a gene analyzer such as, for example, gene analyzerscommercially available from Illumina, Inc., Pacific Biosciences, Inc.,or Applied Biosystems/Thermo Fisher Scientific, among many others.

As used herein, “sequence information” in the context of a nucleic acidpolymer means the order and identity of monomer units (e.g.,nucleotides, etc.) in that polymer.

As used herein “sequence-variable target region set” refers to a set oftarget regions that may exhibit changes in sequence such as nucleotidesubstitutions, insertions, deletions, or gene fusions or transpositionsin neoplastic cells (e.g., tumor cells and cancer cells).

As used herein, the terms “somatic mutation” or “somatic variation” areused interchangeably. They refer to a mutation in the genome that occursafter conception. Somatic mutations can occur in any cell of the bodyexcept germ cells and accordingly, are not passed on to progeny.

As used herein, “specifically binds” in the context of an probe or otheroligonucleotide and a target sequence means that under appropriatehybridization conditions, the oligonucleotide or probe hybridizes to itstarget sequence, or replicates thereof, to form a stable probe:targethybrid, while at the same time formation of stable probe:non-targethybrids is minimized. Thus, a probe hybridizes to a target sequence orreplicate thereof to a sufficiently greater extent than to a non-targetsequence, to enable capture or detection of the target sequence.Appropriate hybridization conditions are well-known in the art, may bepredicted based on sequence composition, or can be determined by usingroutine testing methods (see, e.g., Sambrook et al., Molecular Cloning,A Laboratory Manual, 2nd ed. (Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y., 1989) at §§ 1.90-1.91, 7.37-7.57, 9.47-9.51 and11.47-11.57, particularly §§ 9.50-9.51, 11.12-11.13, 11.45-11.47 and11.55-11.57, incorporated by reference herein).

As used herein, “subject” refers to an animal, such as a mammalianspecies (e.g., human) or avian (e.g., bird) species, or other organism,such as a plant. More specifically, a subject can be a vertebrate, e.g.,a mammal such as a mouse, a primate, a simian or a human. Animalsinclude farm animals (e.g., production cattle, dairy cattle, poultry,horses, pigs, and the like), sport animals, and companion animals (e.g.,pets or support animals). A subject can be a healthy individual, anindividual that has or is suspected of having a disease or apredisposition to the disease, or an individual in need of therapy orsuspected of needing therapy. The terms “individual” or “patient” areintended to be interchangeable with “subject”. For example, a subjectcan be an individual who has been diagnosed with having a cancer, isgoing to receive a cancer therapy, and/or has received at least onecancer therapy. The subject can be in remission of a cancer. As anotherexample, the subject can be an individual who is diagnosed of having anautoimmune disease. As another example, the subject can be a femaleindividual who is pregnant or who is planning on getting pregnant, whomay have been diagnosed of or suspected of having a disease, e.g., acancer, an auto-immune disease.

As used herein, “target-region set” or “set of target regions” or“target regions” or “target regions of interest” or “regions ofinterest” or “genomic regions of interest” refers to a plurality ofgenomic loci or a plurality of genomic regions targeted for captureand/or targeted by a set of probes (e.g., through sequencecomplementarity).

As used herein, “tumor fraction” refers to the proportion of cfDNAmolecules that originated from tumor cells for a given sample, orsample-region pair.

The terms “or a combination thereof” and “or combinations thereof” asused herein refers to any and all permutations and combinations of thelisted terms preceding the term. For example, “A, B, C, or combinationsthereof” is intended to include at least one of: A, B, C, AB, AC, BC, orABC, and if order is important in a particular context, also BA, CA, CB,ACB, CBA, BCA, BAC, or CAB. Continuing with this example, expresslyincluded are combinations that contain repeats of one or more item orterm, such as BB, AAA, AAB, BBC, AAABCCCC, CBBAAA, CABABB, and so forth.The skilled artisan will understand that typically there is no limit onthe number of items or terms in any combination, unless otherwiseapparent from the context.

“Or” is used in the inclusive sense, i.e., equivalent to “and/or,”unless the context requires otherwise.

B. Exemplary Methods

1. Overview

Cancer formation and progression may arise from both geneticmodification and epigenetic features of deoxyribonucleic acid (DNA). Thepresent disclosure provides methods and systems for analyzing DNA, suchas cell-free DNA (cfDNA). The present disclosure provides methods andsystems for reducing signal to noise ratio of methylation partitioningassays.

Without wishing to be bound by any particular theory, cells in or arounda cancer or neoplasm may shed more DNA than cells of the same tissuetype in a healthy subject. As such, the distribution of tissue of originof certain DNA samples, such as cfDNA, may change upon carcinogenesis.Thus, for example, an increase in the level of hypermethylation variabletarget regions that show lower methylation in healthy cfDNA than in atleast one other tissue type can be an indicator of the presence (orrecurrence, depending on the history of the subject) of cancer.Similarly, an increase in the level of hypomethylation variable targetregions in the sample can be an indicator of the presence (orrecurrence, depending on the history of the subject) of cancer.

Additionally, cancer can be indicated by non-sequence modifications,such as methylation. Examples of methylation changes in cancer includelocal gains of DNA methylation in the CpG islands at the TSS of genesinvolved in normal growth control, DNA repair, cell cycle regulation,and/or cell differentiation. This hypermethylation can be associatedwith an aberrant loss of transcriptional capacity of involved genes andoccurs at least as frequently as point mutations and deletions as acause of altered gene expression.

Thus, DNA methylation profiling can be used to detect aberrantmethylation in DNA of a sample. The DNA can correspond to certaingenomic regions (“differentially methylated regions” or “DMRs”) that arenormally hypermethylated or hypomethylated in a given sample type (e.g.,cfDNA from the bloodstream) but which may show an abnormal degree ofmethylation that correlates to a neoplasm or cancer, e.g., because ofunusually increased contributions of tissues to the type of sample(e.g., due to increased shedding of DNA in or around the neoplasm orcancer) and/or from extents of methylation of the genome that arealtered during development or that are perturbed by disease, forexample, cancer or any cancer-associated disease.

In some embodiments, DNA methylation comprises addition of a methylgroup to a cytosine residue at a CpG site (cytosine-phosphate-guaninesite (i.e., a cytosine followed by a guanine in a 5′->3′ direction ofthe nucleic acid sequence). In some embodiments, DNA methylationcomprises addition of a methyl group to an adenine residue, such as inN6-methyladenine. In some embodiments, DNA methylation is 5-methylation(modification of the 5th carbon of the 6-carbon ring of cytosine). Insome embodiments, 5-methylation comprises addition of a methyl group tothe 5C position of the cytosine residue to create 5-methylcytosine (m5cor 5-mC or 5mC). In some embodiments, methylation comprises a derivativeof m5c. Derivatives of m5c include, but are not limited to,5-hydroxymethylcytosine (5-hmC or 5hmC), 5-formylcytosine (5-fC), and5-caryboxylcytosine (5-caC). In some embodiments, DNA methylation is 3Cmethylation (modification of the 3rd carbon of the 6-carbon ring of thecytosine residue). In some embodiments, 3C methylation comprisesaddition of a methyl group to the 3C position of the cytosine residue togenerate 3-methylcytosine (3mC). Methylation can also occur at non-CpGsites, for example, methylation can occur at a CpA, CpT, or CpC site.DNA methylation can change the activity of methylated DNA region. Forexample, when DNA in a promoter region is methylated, transcription ofthe gene may be repressed. DNA methylation is critical for normaldevelopment and abnormality in methylation may disrupt epigeneticregulation. The disruption, e.g., repression, in epigenetic regulationmay cause diseases, such as cancer. Promoter methylation in DNA may beindicative of cancer.

Methylation profiling can involve determining methylation patternsacross different regions of the genome. For example, after partitioningmolecules based on extent of methylation (e.g., relative number ofmethylated nucleotides per molecule) and sequencing, the sequences ofmolecules in the different partitions can be mapped to a referencegenome. This can show regions of the genome that, compared with otherregions, are more highly methylated or are less highly methylated. Inthis way, genomic regions, in contrast to individual molecules, maydiffer in their extent of methylation.

Combining the signals obtained from methylation profiling with thesignals obtained from somatic variations (e.g., SNV, indel, CNV, andgene fusions) facilitate the detection of cancer.

Nucleic acid molecules in a sample may be fractionated or partitionedbased on methylation status of the nucleic acid molecules. Partitioningnucleic acid molecules in a sample can increase a rare signal. Forexample, a genetic variation present in hypermethylated DNA but less (ornot) present in hypomethylated DNA can be more easily detected bypartitioning a sample into hypermethylated and hypomethylated nucleicacid molecules. By analyzing multiple fractions of a sample, amulti-dimensional analysis of a single molecule can be performed andhence, greater sensitivity can be achieved. Partitioning may includephysically partitioning nucleic acid molecules into subsets or groupsbased on the presence or absence of one ore more methylated nucleotides.A sample may be fractionated or partitioned into one or more partitionedsets based on a characteristic that is indicative of differential geneexpression or a disease state. A sample may be fractionated based on acharacteristic, or combination thereof that provides a difference insignal between a normal and diseased state during analysis of nucleicacids, e.g., cell free DNA (“cfDNA”), non-cfDNA, tumor DNA, circulatingtumor DNA (“ctDNA”) and cell free nucleic acids (“cfNA”).

Partitioning procedures may result in imperfect sorting of DNA moleculesamong the subsamples. For example, a minority of the molecules in thesecond subsample may be highly modified (e.g., hypermethylated), and/ora minority of the molecules in the first subsample may be unmodified ormostly unmodified (e.g., unmethylated or mostly unmethylated). Highlymodified molecules in the second subsample and unmodified or mostlyunmodified molecules in the first subsample are considerednonspecifically partitioned. The methods described herein comprise stepsthat can reduce technical noise from nonspecifically partitioned DNA,e.g., by degrading it and/or by converting certain bases such thatnonspecifically partitioned DNA can be identified following sequencing.Thus, the methods described herein can provide improved sensitivityand/or streamlined analysis.

FIG. 2 illustrates an example embodiment of a method 200 for determiningthe methylation status of nucleic acid molecules in a sample obtainedfrom a subject. In 202, a polynucleotides sample is obtained from thesubject. In some embodiments, the sample is a DNA sample is obtainedfrom a tumor tissue biopsy. In some embodiments, the sample is acell-free DNA (cfDNA) sample obtained from blood. In 204, thepolynucleotides sample is partitioned into at least two partitioned sets(subsamples). In some embodiments, the partitioning comprisespartitioning the nucleic acid molecules based on a differential bindingaffinity of the polynucleotides to a binding agent that preferentiallybinds to polynucleotides comprising methylated nucleotides.

In 206, the nucleic acid molecules in at least one partitioned set isdigested with at least one methylation-dependent nuclease, such as amethylation-dependent restriction enzyme (MDRE). In some embodiments,the nucleic acids in at least one partitioned set is digested with atleast two MDREs. In some embodiments, two MDREs are used for digestingthe nucleic acid molecules in at least one partitioned set. Any of theMDREs or combinations thereof described elsewhere herein may be used.

In some embodiments, prior to restriction digestion with MDRE, at leastone adapter is attached to at least one end of the nucleic acidmolecules (i.e., 5′ and/or 3′ ends of the DNA molecule). In otherembodiments, after the digestion but prior to enriching in 208, at leastone adapter is attached to at least one end of the nucleic acidmolecules. In some embodiments, the adapter is resistant to digestion bythe methylation dependent nucleases or restriction enzymes, e.g., due tothe presence of unmethylated nucleotides or appropriate nucleotideanalogs (e.g., nucleotide analogs with linkage modifications, such asphosphorothioate).

In 208, after MDRE digestion, the nucleic acid molecules in the one ormore partitioned sets can be enriched for genomic regions of interest.Alternatively, an enrichment step can be performed before thepartitioning step. In some embodiments, the genomic regions of interestcan comprise differentially methylated regions (e.g., a hypermethylationvariable target region set and/or hypomethylation variable target regionset) for cancer detection. In 210, at least a subset of the enrichedmolecules is sequenced by a next generation sequencer. In 212, thesequencing reads generated by the sequencer are then analyzed usingbioinformatic tools/algorithms to determine the number of molecules inthe one or more partitioned sets, which in turn is used to determine themethylation status at one or more genetic loci of the nucleic acidmolecules in at least one partitioned sets. In some embodiments, the oneor more genetic loci can comprise multiple genetic loci. In someembodiments, the one or more genetic loci can comprise one or moregenomic regions. In some embodiments, the genomic regions can bepromoter region of genes. In some embodiments, prior to sequencing, thenucleic acid molecules can be amplified via PCR amplification. In someembodiments, the primers used in the amplification can comprise at leastone sample index.

FIG. 3 illustrates an example embodiment of a method 300 for detectingthe presence or absence of cancer in a subject according to anembodiment of the disclosure. In 302, a polynucleotides sample isobtained from the subject. In some embodiments, the polynucleotidessample is a DNA sample is obtained from a tumor tissue biopsy. In someembodiments, the polynucleotides sample is a cell-free DNA (cfDNA)sample obtained from blood. In 304, the polynucleotides sample ispartitioned into at least two partitioned sets. In some embodiments, thepartitioning comprises partitioning the nucleic acid molecules based ona differential binding affinity of the polynucleotides to a bindingagent that preferentially binds to polynucleotides comprising methylatednucleotides. Examples of binding agents include, but are not limited to,methyl binding domain (MBDs) and methyl binding proteins (MBPs), whichare discussed in detail elsewhere herein.

In 306, the nucleic acid molecules in the one or more partitioned setsare attached with adapters, wherein the adapter comprises at least onetag and is attached to at least one end of the nucleic acid molecules(i.e., 5′ and/or 3′ ends of the DNA molecule). In some embodiments, theadapter is resistant to digestion by the methylation dependentrestriction enzymes. In some embodiments, the adapter comprisesunmethylated nucleotides. In some embodiments, the adapter comprises oneor more nucleotide analogs resistant to methylation dependentrestriction enzymes. In some embodiments, the adapter comprises anucleotide sequence not recognized by methylation dependent restrictionenzymes. In some embodiments, the tags may be provided as components ofadapters. In some embodiments, the tag comprises molecular barcode(i.e., molecule identifier). In some embodiments, the tag attached tonucleic acid molecules in one partitioned set is different from the tagattached to nucleic acid molecules in the other partitioned set(s). Insome embodiments, one partitioned set is differentially tagged from theother partitioned set(s). Differential tagging of the partitioned setshelps in keeping track of the nucleic acid molecules belonging to aparticular partitioned set. The nucleic acid molecules in differentpartitioned sets receive different tags that can distinguish members ofone partitioned set from another. The tags linked to nucleic acidmolecules of the same partition set can be the same or different fromone another. But if different from one another, the tags can have partof their sequence in common so as to identify the molecules to whichthey are attached as being of a particular partitioned set. For example,if the molecules of the sample are partitioned into two partitionedsets—P1 and P2, then the molecules in P1 can be tagged with A1, A2, A3,and so forth, and the molecules in P2 can be tagged with B1, B2, B3, andso forth. Such a tagging system allows distinguishing the partitionedsets and between the molecules within a partitioned set. In someembodiments, the tag comprises partition tag (i.e., partitionidentifier). In such embodiments, the nucleic acid molecules within apartitioned set receive the same partition tag, which is different fromthe partition tag attached to the nucleic acid molecules of the otherpartitioned set(s).

In 308, the nucleic acid molecules in at least one partitioned set isdigested with at least one methylation dependent nuclease or methylationdependent restriction enzyme (MDRE). In some embodiments, the nucleicacids in at least one partitioned set is digested with at least twoMDREs. In some embodiments, two MDREs are used for digesting the nucleicacid molecules in at least one partitioned set. The MDRE may be any MDREor combination thereof described elsewhere herein.

In 310, after MDRE digestion, the nucleic acid molecules in the one ormore partitioned sets can be enriched for genomic regions of interest.Alternatively, an enrichment step can be performed before thepartitioning step. In some embodiments, the genomic regions of interestcan comprise differentially methylated regions for cancer detection. In312, at least a subset of the enriched molecules is sequenced by a nextgeneration sequencer. In 314, the sequencing reads generated by thesequencer are then analyzed using bioinformatic tools/algorithms todetermine the number of molecules in the one or more partitioned sets,which in turn is used to determine the methylation status at one or moregenetic loci of the nucleic acid molecules in at least one partitionedsets. In some embodiments, the one or more genetic loci can comprisemultiple genetic loci. In some embodiments, the one or more genetic locican comprise one or more genomic regions. In some embodiments, thegenomic regions can be promoter region of genes. In some embodiments,prior to sequencing, the nucleic acid molecules can be amplified via PCRamplification. In some embodiments, the primers used in theamplification can comprise at least one sample index.

In some embodiments, the method can further comprise detecting thepresence or absence of cancer in the subject, e.g., based on themethylation status at one or more genetic loci of the nucleic acidmolecules in at least one partitioned set. In some embodiments, themethod further comprises determining a level of DNA from tumor cells inthe polynucleotide sample.

FIG. 4 illustrates an exemplary workflow, e.g., to detect the presenceor absence of cancer, according to certain embodiments of the disclosurebeginning with a cfDNA sample, in which cfDNA is isolated from the bloodsample and the cfDNA sample comprises cfDNA molecules belonging tohypermethylation variable target regions (Hyper DMR) and hypomethylationvariable target regions (Hypo DMR) and unmethylated control regions. ThecfDNA is partitioned using a methyl-binding domain protein (MBD) intohypo methylated and hyper methylated subsamples; each partitioned set issubjected to molecular barcoding to distinguishably tag DNA from thesubsamples; the hypo partitioned set is digested with one or more MDREs,cleaving methylated cfDNA molecules at the RE recognition site, andoptionally, the hyper partitioned set is digested with one or moreMSREs, cleaving the unmethylated cfDNA molecules at the RE recognitionsite; and then partitioned sets (including the MDRE digested hypopartitioned set) are pooled, captured, amplified, and sequenced.

2. Partitioning the Sample into a Plurality of Subsamples; Aspects ofSamples

In certain embodiments described herein, a population of different formsof nucleic acids (e.g., hypermethylated and hypomethylated DNA in asample, such as cfDNA) can be physically partitioned based on one ormore characteristics of the nucleic acids prior to further analysis,e.g., contacting with a nuclease, differentially modifying or isolatinga nucleobase, tagging, and/or sequencing. This approach can be used todetermine, for example, whether certain sequences are hypermethylated orhypomethylated. Additionally, by partitioning a heterogeneous nucleicacid population, one may increase rare signals, e.g., by enriching rarenucleic acid molecules that are more prevalent in one fraction (orpartition) of the population. For example, a genetic variation presentin hyper-methylated DNA but less (or not) in hypomethylated DNA can bemore easily detected by partitioning a sample into hyper-methylated andhypo-methylated nucleic acid molecules. By analyzing multiple fractionsof a sample, a multi-dimensional analysis of a single locus of a genomeor species of nucleic acid can be performed and hence, greatersensitivity can be achieved.

In some instances, a heterogeneous nucleic acid sample is partitionedinto two or more partitions (e.g., at least 3, 4, 5, 6 or 7 partitions).Partitions of a sample are also referred to herein as subsamples. Insome embodiments, each partition is differentially tagged. Taggedpartitions can then be pooled together for collective sample prep and/orsequencing. The partitioning-tagging-pooling steps can occur more thanonce, with each round of partitioning occurring based on a differentcharacteristics (examples provided herein), and tagged usingdifferential tags that are distinguished from other partitions andpartitioning means.

Examples of characteristics that can be used for partitioning includesequence length, methylation level, nucleosome binding, sequencemismatch, immunoprecipitation, and/or proteins that bind to DNA.Resulting partitions can include one or more of the following nucleicacid forms: single-stranded DNA (ssDNA), double-stranded DNA (dsDNA),shorter DNA fragments and longer DNA fragments. In some embodiments,partitioning based on a cytosine modification (e.g., cytosinemethylation) or methylation generally is performed and is optionallycombined with at least one additional partitioning step, which may bebased on any of the foregoing characteristics or forms of DNA. In someembodiments, a heterogeneous population of nucleic acids is partitionedinto nucleic acids with one or more epigenetic modifications and withoutthe one or more epigenetic modifications. Examples of epigeneticmodifications include presence or absence of methylation; level ofmethylation; type of methylation (e.g., 5-methylcytosine versus othertypes of methylation, such as adenine methylation and/or cytosinehydroxymethylation); and association and level of association with oneor more proteins, such as histones. Alternatively or additionally, aheterogeneous population of nucleic acids can be partitioned intonucleic acid molecules associated with nucleosomes and nucleic acidmolecules devoid of nucleosomes. Alternatively or additionally, aheterogeneous population of nucleic acids may be partitioned intosingle-stranded DNA (ssDNA) and double-stranded DNA (dsDNA).Alternatively, or additionally, a heterogeneous population of nucleicacids may be partitioned based on nucleic acid length (e.g., moleculesof up to 160 bp and molecules having a length of greater than 160 bp).

In some instances, each partition (representative of a different nucleicacid form) is differentially labelled, and the partitions are pooledtogether prior to sequencing. In other instances, the different formsare separately sequenced.

In some embodiments, a population of different nucleic acids ispartitioned into two or more different partitions. Each partition isrepresentative of a different nucleic acid form, and a first partition(also referred to as a subsample) comprises DNA with a cytosinemodification in a greater proportion than a second subsample. Eachpartition is distinctly tagged. The first subsample is subjected to aprocedure that affects a first nucleobase in the DNA differently from asecond nucleobase in the DNA of the first subsample, wherein the firstnucleobase is a modified or unmodified nucleobase, the second nucleobaseis a modified or unmodified nucleobase different from the firstnucleobase, and the first nucleobase and the second nucleobase have thesame base pairing specificity. The tagged nucleic acids are pooledtogether prior to sequencing. Sequence reads are obtained and analyzed,including to distinguish the first nucleobase from the second nucleobasein the DNA of the first subsample, in silico. Tags are used to sortreads from different partitions. Analysis to detect genetic variants canbe performed on a partition-by-partition level, as well as whole nucleicacid population level. For example, analysis can include in silicoanalysis to determine genetic variants, such as CNV, SNV, indel, fusionin nucleic acids in each partition. In some instances, in silicoanalysis can include determining chromatin structure. For example,coverage of sequence reads can be used to determine nucleosomepositioning in chromatin. Higher coverage can correlate with highernucleosome occupancy in genomic region while lower coverage cancorrelate with lower nucleosome occupancy or nucleosome depleted region(NDR).

Samples can include nucleic acids varying in modifications includingpost-replication modifications to nucleotides and binding, usuallynoncovalently, to one or more proteins.

In an embodiment, the population of nucleic acids is one obtained from aserum, plasma or blood sample from a subject suspected of havingneoplasia, a tumor, or cancer or previously diagnosed with neoplasia, atumor, or cancer. The population of nucleic acids includes nucleic acidshaving varying levels of methylation. Methylation can occur from any oneor more post-replication or transcriptional modifications.Post-replication modifications include modifications of the nucleotidecytosine, particularly at the 5-position of the nucleobase, e.g.,5-methylcytosine, 5-hydroxymethylcytosine, 5-formylcytosine and5-carboxylcytosine.

The affinity agents can be antibodies with the desired specificity,natural binding partners or variants thereof (Bock et al., Nat Biotech28: 1106-1114 (2010); Song et al., Nat Biotech 29: 68-72 (2011)), orartificial peptides selected e.g., by phage display to have specificityto a given target.

Examples of capture moieties contemplated herein include methyl bindingdomain (MBDs) and methyl binding proteins (MBPs) as described herein,including proteins such as MeCP2, an MBD such as MBD2, and antibodiespreferentially binding to 5-methylcytosine. Where an antibody is used toimmunoprecipitate methylated DNA, the methylated DNA may be recovered insingle-stranded form. In such embodiments, a second strand can besynthesized. Hypermethylated (and optionally intermediately methylated)subsamples may then be contacted with a methylation sensitive nucleasethat does not cleave hemi-methylated DNA, such as HpaII, BstUI, orHin6i. Alternatively or in addition, hypomethylated (and optionallyintermediately methylated) subsamples may then be contacted with amethylation dependent nuclease that cleaves hemi-methylated DNA.

Likewise, partitioning of different forms of nucleic acids can beperformed using histone binding proteins which can separate nucleicacids bound to histones from free or unbound nucleic acids. Examples ofhistone binding proteins that can be used in the methods disclosedherein include RBBP4, RbAp48 and SANT domain peptides.

Although for some affinity agents and modifications, binding to theagent may occur in an essentially all or none manner depending onwhether a nucleic acid bears a modification, the separation may be oneof degree. In such instances, nucleic acids overrepresented in amodification bind to the agent at a greater extent that nucleic acidsunderrepresented in the modification. Alternatively, nucleic acidshaving modifications may bind in an all or nothing manner. But then,various levels of modifications may be sequentially eluted from thebinding agent.

For example, in some embodiments, partitioning can be binary or based ondegree/level of modifications. For example, all methylated fragments canbe partitioned from unmethylated fragments using methyl-binding domainproteins (e.g., MethylMinder Methylated DNA Enrichment Kit (ThermoFisherScientific). Subsequently, additional partitioning may involve elutingfragments having different levels of methylation by adjusting the saltconcentration in a solution with the methyl-binding domain and boundfragments. As salt concentration increases, fragments having greatermethylation levels are eluted.

In some instances, the final partitions are representative of nucleicacids having different extents of modifications (overrepresentative orunderrepresentative of modifications). Overrepresentation andunderrepresentation can be defined by the number of modifications bornby a nucleic acid relative to the median number of modifications perstrand in a population. For example, if the median number of5-methylcytosine residues in nucleic acid in a sample is 2, a nucleicacid including more than two 5-methylcytosine residues isoverrepresented in this modification and a nucleic acid with 1 or zero5-methylcytosine residues is underrepresented. The effect of theaffinity separation is to enrich for nucleic acids overrepresented in amodification in a bound phase and for nucleic acids underrepresented ina modification in an unbound phase (i.e. in solution). The nucleic acidsin the bound phase can be eluted before subsequent processing.

When using MethylMiner Methylated DNA Enrichment Kit (ThermoFisherScientific) various levels of methylation can be partitioned usingsequential elutions. For example, a hypomethylated partition (nomethylation) can be separated from a methylated partition by contactingthe nucleic acid population with the MBD from the kit, which is attachedto magnetic beads. The beads are used to separate out the methylatednucleic acids from the non-methylated nucleic acids. Subsequently, oneor more elution steps are performed sequentially to elute nucleic acidshaving different levels of methylation. For example, a first set ofmethylated nucleic acids can be eluted at a salt concentration of 160 mMor higher, e.g., at least 150 mM, at least 200 mM, 300 mM, 400 mM, 500mM, 600 mM, 700 mM, 800 mM, 900 mM, 1000 mM, or 2000 mM. After suchmethylated nucleic acids are eluted, magnetic separation is once againused to separate higher level of methylated nucleic acids from thosewith lower level of methylation. The elution and magnetic separationsteps can repeat themselves to create various partitions such as ahypomethylated partition (representative of no methylation), amethylated partition (representative of low level of methylation), and ahyper methylated partition (representative of high level ofmethylation).

In some methods, nucleic acids bound to an agent used for affinityseparation are subjected to a wash step. The wash step washes offnucleic acids weakly bound to the affinity agent. Such nucleic acids canbe enriched in nucleic acids having the modification to an extent closeto the mean or median (i.e., intermediate between nucleic acidsremaining bound to the solid phase and nucleic acids not binding to thesolid phase on initial contacting of the sample with the agent).

The affinity separation results in at least two, and sometimes three ormore partitions of nucleic acids with different extents of amodification. While the partitions are still separate, the nucleic acidsof at least one partition, and usually two or three (or more) partitionsare linked to nucleic acid tags, usually provided as components ofadapters, with the nucleic acids in different partitions receivingdifferent tags that distinguish members of one partition from another.The tags linked to nucleic acid molecules of the same partition can bethe same or different from one another. But if different from oneanother, the tags may have part of their code in common so as toidentify the molecules to which they are attached as being of aparticular partition.

For further details regarding partitioning nucleic acid samples based oncharacteristics such as methylation, see WO2018/119452, which isincorporated herein by reference.

In some embodiments, the nucleic acid molecules can be partitioned intodifferent partitions based on the nucleic acid molecules that are boundto a specific protein or a fragment thereof and those that are not boundto that specific protein or fragment thereof.

Nucleic acid molecules can be partitioned based on DNA-protein binding.Protein-DNA complexes can be partitioned based on a specific property ofa protein. Examples of such properties include various epitopes,modifications (e.g., histone methylation or acetylation) or enzymaticactivity. Examples of proteins which may bind to DNA and serve as abasis for fractionation may include, but are not limited to, protein Aand protein G. Any suitable method can be used to partition the nucleicacid molecules based on protein bound regions. Examples of methods usedto partition nucleic acid molecules based on protein bound regionsinclude, but are not limited to, SDS-PAGE,chromatin-immuno-precipitation (ChIP), heparin chromatography, andasymmetrical field flow fractionation (AF4).

In some embodiments, partitioning of the nucleic acids is performed bycontacting the nucleic acids with a methylation binding domain (“MBD”)of a methylation binding protein (“MBP”). MBD binds to 5-methylcytosine(5mC). MBD is coupled to paramagnetic beads, such as Dynabeads® M-280Streptavidin via a biotin linker. Partitioning into fractions withdifferent extents of methylation can be performed by eluting fractionsby increasing the NaCl concentration.

Examples of MBPs contemplated herein include, but are not limited to:

-   -   (a) MeCP2 and MBD2 are proteins preferentially binding to        5-methyl-cytosine over unmodified cytosine.    -   (b) RPL26, PRP8 and the DNA mismatch repair protein MHS6        preferentially bind to 5-hydroxymethyl-cytosine over unmodified        cytosine.    -   (c) FOXK1, FOXK2, FOXP1, FOXP4 and FOXI3 preferably bind to        5-formyl-cytosine over unmodified cytosine (Iurlaro et al.,        Genome Biol. 14: R119 (2013)).    -   (d) Antibodies specific to one or more methylated nucleotide        bases.

In general, elution is a function of number of methylated sites permolecule, with molecules having more methylation eluting under increasedsalt concentrations. To elute the DNA into distinct populations based onthe extent of methylation, one can use a series of elution buffers ofincreasing NaCl concentration. Salt concentration can range from about100 nm to about 2500 mM NaCl. In one embodiment, the process results inthree (3) partitions. Molecules are contacted with a solution at a firstsalt concentration and comprising a molecule comprising a methyl bindingdomain, which molecule can be attached to a capture moiety, such asstreptavidin. At the first salt concentration a population of moleculeswill bind to the MBD and a population will remain unbound. The unboundpopulation can be separated as a “hypomethylated” population. Forexample, a first partition representative of the hypomethylated form ofDNA is that which remains unbound at a low salt concentration, e.g., 100mM or 160 mM. A second partition representative of intermediatemethylated DNA is eluted using an intermediate salt concentration, e.g.,between 100 mM and 2000 mM concentration. This is also separated fromthe sample. A third partition representative of hypermethylated form ofDNA is eluted using a high salt concentration, e.g., at least about 2000mM.

a. Tagging of Partitions

In some embodiments, two or more partitions, e.g., each partition,is/are differentially tagged. Tags or indexes can be molecules, such asnucleic acids, containing information that indicates a feature of themolecule with which the tag is associated. Tags can allow one todifferentiate molecules from which sequence reads originated. Forexample, molecules can bear a sample tag or sample index (whichdistinguishes molecules in one sample from those in a different sample),a partition tag (which distinguishes molecules in one partition fromthose in a different partition) or a molecular tag/molecularbarcode/barcode (which distinguishes different molecules from oneanother (in both unique and non-unique tagging scenarios). In certainembodiments, a tag can comprise one or a combination of barcodes. Asused herein, the term “barcode” refers to a nucleic acid molecule havinga particular nucleotide sequence, or to the nucleotide sequence, itself,depending on context. A barcode can have, for example, between 10 and100 nucleotides. A collection of barcodes can have degenerate sequencesor can have sequences having a certain hamming distance, as desired forthe specific purpose. So, for example, a molecular barcode can becomprised of one barcode or a combination of two barcodes, each attachedto different ends of a molecule. Additionally or alternatively, fordifferent partitions and/or samples, different sets of molecularbarcodes, molecular tags, or molecular indexes can be used such that thebarcodes serve as a molecular tag through their individual sequences andalso serve to identify the partition and/or sample to which theycorrespond based the set of which they are a member. Tags comprisingbarcodes can be incorporated into or otherwise joined to adapters. Tagscan be incorporated by ligation, overlap extension PCR among othermethods.

Tagging strategies can be divided into unique tagging and non-uniquetagging strategies. In unique tagging, all or substantially all of themolecules in a sample bear a different tag, so that reads can beassigned to original molecules based on tag information alone. Tags usedin such methods are sometimes referred to as “unique tags”. Innon-unique tagging, different molecules in the same sample can bear thesame tag, so that other information in addition to tag information isused to assign a sequence read to an original molecule. Such informationmay include start and stop coordinate, coordinate to which the moleculemaps, start or stop coordinate alone, etc. Tags used in such methods aresometimes referred to as “non-unique tags”. Accordingly, it is notnecessary to uniquely tag every molecule in a sample. It suffices touniquely tag molecules falling within an identifiable class within asample. Thus, molecules in different identifiable families can bear thesame tag without loss of information about the identity of the taggedmolecule.

In certain embodiments of non-unique tagging, the number of differenttags used can be sufficient that there is a very high likelihood (e.g.,at least 99%, at least 99.9%, at least 99.99% or at least 99.999% thatall molecules of a particular group bear a different tag. It is to benoted that when barcodes are used as tags, and when barcodes areattached, e.g., randomly, to both ends of a molecule, the combination ofbarcodes, together, can constitute a tag. This number, in term, is afunction of the number of molecules falling into the calls. For example,the class may be all molecules mapping to the same start-stop positionon a reference genome. The class may be all molecules mapping across aparticular genetic locus, e.g., a particular base or a particular region(e.g., up to 100 bases or a gene or an exon of a gene). In certainembodiments, the number of different tags used to uniquely identify anumber of molecules, z, in a class can be between any of 2*z, 3*z, 4*z,5*z, 6*z, 7*z, 8*z, 9*z, 10*z, 11*z, 12*z, 13*z, 14*z, 15*z, 16*z, 17*z,18*z, 19*z, 20*z or 100*z (e.g., lower limit) and any of 100,000*z,10,000*z, 1000*z or 100*z (e.g., upper limit).

For example, in a sample of about 5 ng to 30 ng of cell free DNA, oneexpects around 3000 molecules to map to a particular nucleotidecoordinate, and between about 3 and 10 molecules having any startcoordinate to share the same stop coordinate. Accordingly, about 50 toabout 50,000 different tags (e.g., between about 6 and 220 barcodecombinations) can suffice to uniquely tag all such molecules. Touniquely tag all 3000 molecules mapping across a nucleotide coordinate,about 1 million to about 20 million different tags would be required.

Generally, assignment of unique or non-unique tags barcodes in reactionsfollows methods and systems described by US patent applications20010053519, 20030152490, 20110160078, and U.S. Pat. Nos. 6,582,908 and7,537,898 and 9,598,731. Tags can be linked to sample nucleic acidsrandomly or non-randomly.

In some embodiments, the tagged nucleic acids are sequenced afterloading into a microwell plate. The microwell plate can have 96, 384, or1536 microwells. In some cases, they are introduced at an expected ratioof unique tags to microwells. For example, the unique tags may be loadedso that more than about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500,1000, 5000, 10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000,50,000,000 or 1,000,000,000 unique tags are loaded per genome sample. Insome cases, the unique tags may be loaded so that less than about 2, 3,4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000, 10000, 50,000,100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or 1,000,000,000unique tags are loaded per genome sample. In some cases, the averagenumber of unique tags loaded per sample genome is less than, or greaterthan, about 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 50, 100, 500, 1000, 5000,10000, 50,000, 100,000, 500,000, 1,000,000, 10,000,000, 50,000,000 or1,000,000,000 unique tags per genome sample.

A preferred format uses 20-50 different tags (e.g., barcodes) ligated toboth ends of target nucleic acids. For example 35 different tags (e.g.,barcodes) ligated to both ends of target molecules creating 35×35permutations, which equals 1225 for 35 tags. Such numbers of tags aresufficient so that different molecules having the same start and stoppoints have a high probability (e.g., at least 94%, 99.5%, 99.99%,99.999%) of receiving different combinations of tags. Other barcodecombinations include any number between 10 and 500, e.g., about 15×15,about 35×35, about 75×75, about 100×100, about 250×250, about 500×500.

In some cases, unique tags may be predetermined or random or semi-randomsequence oligonucleotides. In other cases, a plurality of barcodes maybe used such that barcodes are not necessarily unique to one another inthe plurality. In this example, barcodes may be ligated to individualmolecules such that the combination of the barcode and the sequence itmay be ligated to creates a unique sequence that may be individuallytracked. As described herein, detection of non-unique barcodes incombination with sequence data of beginning (start) and end (stop)portions of sequence reads may allow assignment of a unique identity toa particular molecule. The length or number of base pairs, of anindividual sequence read may also be used to assign a unique identity tosuch a molecule. As described herein, fragments from a single strand ofnucleic acid having been assigned a unique identity, may thereby permitsubsequent identification of fragments from the parent strand.

Tags can be used to label the individual polynucleotide populationpartitions so as to correlate the tag (or tags) with a specificpartition. Alternatively, tags can be used in embodiments of theinvention that do not employ a partitioning step. In some embodiments, asingle tag can be used to label a specific partition. In someembodiments, multiple different tags can be used to label a specificpartition. In embodiments employing multiple different tags to label aspecific partition, the set of tags used to label one partition can bereadily differentiated for the set of tags used to label otherpartitions. In some embodiments, the tags may have additional functions,for example the tags can be used to index sample sources or used asunique molecular identifiers (which can be used to improve the qualityof sequencing data by differentiating sequencing errors from mutations,for example as in Kinde et al., Proc Nat'l Acad Sci USA 108: 9530-9535(2011), Kou et al., PLoS ONE, 11: e0146638 (2016)) or used as non-uniquemolecule identifiers, for example as described in U.S. Pat. No.9,598,731. Similarly, in some embodiments, the tags may have additionalfunctions, for example the tags can be used to index sample sources orused as non-unique molecular identifiers (which can be used to improvethe quality of sequencing data by differentiating sequencing errors frommutations).

In one embodiment, partition tagging comprises tagging molecules in eachpartition with a partition tag. After re-combining partitions (e.g., toreduce the number of sequencing runs needed and avoid unnecessary cost)and sequencing molecules, the partition tags identify the sourcepartition. In another embodiment, different partitions are tagged withdifferent sets of molecular tags, e.g., comprised of a pair of barcodes.In this way, each molecular barcode indicates the source partition aswell as being useful to distinguish molecules within a partition. Forexample, a first set of 35 barcodes can be used to tag molecules in afirst partition, while a second set of 35 barcodes can be used tagmolecules in a second partition.

In some embodiments, after partitioning and tagging with partition tags,the molecules may be pooled for sequencing in a single run. In someembodiments, a sample tag is added to the molecules, e.g., in a stepsubsequent to addition of partition tags and pooling. Sample tags canfacilitate pooling material generated from multiple samples forsequencing in a single sequencing run.

Alternatively, in some embodiments, partition tags may be correlated tothe sample as well as the partition. As a simple example, a first tagcan indicate a first partition of a first sample; a second tag canindicate a second partition of the first sample; a third tag canindicate a first partition of a second sample; and a fourth tag canindicate a second partition of the second sample.

While tags may be attached to molecules already partitioned based on oneor more characteristics, the final tagged molecules in the library mayno longer possess that characteristic. For example, while singlestranded DNA molecules may be partitioned and tagged, the final taggedmolecules in the library are likely to be double stranded. Similarly,while DNA may be subject to partition based on different levels ofmethylation, in the final library, tagged molecules derived from thesemolecules are likely to be unmethylated. Accordingly, the tag attachedto molecule in the library typically indicates the characteristic of the“parent molecule” from which the ultimate tagged molecule is derived,not necessarily to characteristic of the tagged molecule, itself.

As an example, barcodes 1, 2, 3, 4, etc. are used to tag and labelmolecules in the first partition; barcodes A, B, C, D, etc. are used totag and label molecules in the second partition; and barcodes a, b, c,d, etc. are used to tag and label molecules in the third partition.Differentially tagged partitions can be pooled prior to sequencing.Differentially tagged partitions can be separately sequenced orsequenced together concurrently, e.g., in the same flow cell of anIllumina sequencer.

After sequencing, analysis of reads to detect genetic variants can beperformed on a partition-by-partition level, as well as a whole nucleicacid population level. Tags are used to sort reads from differentpartitions. Analysis can include in silico analysis to determine geneticand epigenetic variation (one or more of methylation, chromatinstructure, etc.) using sequence information, genomic coordinates length,coverage, and/or copy number. In some embodiments, higher coverage cancorrelate with higher nucleosome occupancy in genomic region while lowercoverage can correlate with lower nucleosome occupancy or a nucleosomedepleted region (NDR).

In some embodiments, adapters are used that do not comprise a sequencerecognized by nucleases used in the method, and/or are resistant tocleavage, e.g., because of the presence of nucleotide modifications suchas linkage modifications (e.g., phosphorothioate). In some embodiments,tags are used that do not comprise a sequence recognized by nucleasesused in the method, and/or are resistant to cleavage, e.g., because ofthe presence of nucleotide modifications such as linkage modifications(e.g., phosphorothioate). Where both one or more methylation-dependentrestriction enzymes and one or more methylation-sensitive restrictionenzymes are used, the adapters and/or tags may lack methylation and maylack recognition sequences of the one or more methylation-sensitiverestriction enzymes, such that they are not substrates for cleavage byany of the restriction enzymes used.

b. Alternative Methods of Modified Nucleic Acid Analysis

In some embodiments the adapters are added to the nucleic acids afterpartitioning the nucleic acids, in other embodiments the adapters may beadded to the nucleic acids prior to partitioning the nucleic acids. Insome such methods, a population of nucleic acids bearing themodification to different extents (e.g., 0, 1, 2, 3, 4, 5 or more methylgroups per nucleic acid molecule) is contacted with adapters beforefractionation of the population depending on the extent of themodification. Adapters attach to either one end or both ends of nucleicacid molecules in the population. Preferably, the adapters includedifferent tags of sufficient numbers that the number of combinations oftags results in a low probability e.g., 95, 99 or 99.9% of two nucleicacids with the same start and stop points receiving the same combinationof tags. Adapters, whether bearing the same or different tags, caninclude the same or different primer binding sites, but preferablyadapters include the same primer binding site. Following attachment ofadapters, the nucleic acids are contacted with an agent thatpreferentially binds to nucleic acids bearing the modification (such asthe previously described such agents). The nucleic acids are partitionedinto at least two subsamples differing in the extent to which thenucleic acids bear the modification from binding to the agents. Forexample, if the agent has affinity for nucleic acids bearing themodification, nucleic acids overrepresented in the modification(compared with median representation in the population) preferentiallybind to the agent, whereas nucleic acids underrepresented for themodification do not bind or are more easily eluted from the agent.Following partitioning, the first subsample is subjected to a procedurethat affects a first nucleobase in the DNA differently from a secondnucleobase in the DNA of the first subsample, wherein the firstnucleobase is a modified or unmodified nucleobase, the second nucleobaseis a modified or unmodified nucleobase different from the firstnucleobase, and the first nucleobase and the second nucleobase have thesame base pairing specificity. The nucleic acids are then amplified fromprimers binding to the primer binding sites within the adapters.Following amplification, the different partitions can then be subject tofurther processing steps, which typically include further (e.g., clonal)amplification, and sequence analysis, in parallel but separately.Sequence data from the different partitions can then be compared.

In another embodiment, a partitioning scheme can be performed using thefollowing exemplary procedure. Nucleic acids are linked at both ends toY-shaped adapters including primer binding sites and tags. The moleculesare amplified. The amplified molecules are then fractionated by contactwith an antibody preferentially binding to 5-methylcytosine to producetwo partitions. One partition includes original molecules lackingmethylation and amplification copies having lost methylation. The otherpartition includes original DNA molecules with methylation. Thepartition including original DNA molecules with methylation is subjectedto a procedure that affects a first nucleobase in the DNA differentlyfrom a second nucleobase in the DNA of the first subsample, wherein thefirst nucleobase is a modified or unmodified nucleobase, the secondnucleobase is a modified or unmodified nucleobase different from thefirst nucleobase, and the first nucleobase and the second nucleobasehave the same base pairing specificity. The two partitions are thenprocessed and sequenced separately with further amplification of themethylated partition. The sequence data of the two partitions can thenbe compared. In this example, tags are not used to distinguish betweenmethylated and unmethylated DNA but rather to distinguish betweendifferent molecules within these partitions so that one can determinewhether reads with the same start and stop points are based on the sameor different molecules.

The disclosure provides further methods for analyzing a population ofnucleic acid in which at least some of the nucleic acids include one ormore modified cytosine residues, such as 5-methylcytosine and any of theother modifications described previously. In these methods, afterpartitioning, the subsamples of nucleic acids are contacted withadapters including one or more cytosine residues modified at the 5Cposition, such as 5-methylcytosine. Preferably all cytosine residues insuch adapters are also modified, or all such cytosines in a primerbinding region of the adapters are modified. Adapters attach to bothends of nucleic acid molecules in the population. Preferably, theadapters include different tags of sufficient numbers that the number ofcombinations of tags results in a low probability e.g., 95, 99 or 99.9%of two nucleic acids with the same start and stop points receiving thesame combination of tags. The primer binding sites in such adapters canbe the same or different, but are preferably the same. After attachmentof adapters, the nucleic acids are amplified from primers binding to theprimer binding sites of the adapters. The amplified nucleic acids aresplit into first and second aliquots. The first aliquot is assayed forsequence data with or without further processing. The sequence data onmolecules in the first aliquot is thus determined irrespective of theinitial methylation state of the nucleic acid molecules. The nucleicacid molecules in the second aliquot are subjected to a procedure thataffects a first nucleobase in the DNA differently from a secondnucleobase in the DNA, wherein the first nucleobase comprises a cytosinemodified at the 5 position, and the second nucleobase comprisesunmodified cytosine. This procedure may be bisulfite treatment oranother procedure that converts unmodified cytosines to uracils. Thenucleic acids subjected to the procedure are then amplified with primersto the original primer binding sites of the adapters linked to nucleicacid. Only the nucleic acid molecules originally linked to adapters (asdistinct from amplification products thereof) are now amplifiablebecause these nucleic acids retain cytosines in the primer binding sitesof the adapters, whereas amplification products have lost themethylation of these cytosine residues, which have undergone conversionto uracils in the bisulfite treatment. Thus, only original molecules inthe populations, at least some of which are methylated, undergoamplification. After amplification, these nucleic acids are subject tosequence analysis. Comparison of sequences determined from the first andsecond aliquots can indicate among other things, which cytosines in thenucleic acid population were subject to methylation.

Such an analysis can be performed using the following exemplaryprocedure. After partitioning, methylated DNA is linked to Y-shapedadapters at both ends including primer binding sites and tags. Thecytosines in the adapters are modified at the 5 position (e.g.,5-methylated). The modification of the adapters serves to protect theprimer binding sites in a subsequent conversion step (e.g., bisulfitetreatment, TAP conversion, or any other conversion that does not affectthe modified cytosine but affects unmodified cytosine). After attachmentof adapters, the DNA molecules are amplified. The amplification productis split into two aliquots for sequencing with and without conversion.The aliquot not subjected to conversion can be subjected to sequenceanalysis with or without further processing. The other aliquot issubjected to a procedure that affects a first nucleobase in the DNAdifferently from a second nucleobase in the DNA, wherein the firstnucleobase comprises a cytosine modified at the 5 position, and thesecond nucleobase comprises unmodified cytosine. This procedure may bebisulfite treatment or another procedure that converts unmodifiedcytosines to uracils. Only primer binding sites protected bymodification of cytosines can support amplification when contacted withprimers specific for original primer binding sites. Thus, only originalmolecules and not copies from the first amplification are subjected tofurther amplification. The further amplified molecules are thensubjected to sequence analysis. Sequences can then be compared from thetwo aliquots. As in the separation scheme discussed above, nucleic acidtags in adapters are not used to distinguish between methylated andunmethylated DNA but to distinguish nucleic acid molecules within thesame partition.

3. Contacting a Subsample with a Methylation-Dependent orMethylation-Sensitive Nuclease

In some embodiments, a subsample (e.g., a first, second, or thirdsubsample prepared by partitioning a sample as described herein, such ason the basis of a level of a cytosine modification, such as methylation,e.g., 5-methylation) is contacted with a methylation-dependent nucleaseor methylation-sensitive nuclease. Unless otherwise indicated, wherepartitioning is performed on the basis of a cytosine modification, thefirst subsample is the subsample with a higher level of themodification; the second subsample is the subsample with a lower levelof the modification; and, when present, the third subsample has a levelof the modification intermediate between the first and secondsubsamples.

As discussed above, partitioning procedures may result in imperfectsorting of DNA molecules among the subsamples. The choice of amethylation-dependent nuclease or methylation-sensitive nuclease can bemade so as to degrade nonspecifically partitioned DNA. For example, thesecond subsample can be contacted with a methylation-dependent nuclease,such as a methylation-dependent restriction enzyme. This can degradenonspecifically partitioned DNA in the second subsample (e.g.,methylated DNA) to produce a treated second subsample. Alternatively orin addition, the first subsample can be contacted with amethylation-sensitive endonuclease, such as a methylation-sensitiverestriction enzyme, thereby degrading nonspecifically partitioned DNA inthe first subsample to produce a treated first subsample. Degradation ofnonspecifically partitioned DNA in either or both of the first or secondsubsamples is proposed as an improvement to the performance of methodsthat rely on accurate partitioning of DNA on the basis of a cytosinemodification, e.g., to detect the presence of aberrantly modified DNA ina sample, to determine the tissue of origin of DNA, and/or to determinewhether a subject has cancer. For example, such degradation may provideimproved sensitivity and/or simplify downstream analyses. In general,where nonspecifically partitioned DNA would be hypermethylated, such asin a hypomethylated partition, a methylation-dependent nuclease, such asa methylation-dependent restriction enzyme, should be used. Conversely,where nonspecifically partitioned DNA would be hypomethylated, such asin a hypermethylated partition, a methylation-sensitive nuclease, suchas a methylation-sensitive restriction enzyme, should be used.Methylation-dependent nucleases, such as methylation-dependentrestriction enzymes, preferentially cut methylated DNA relative tounmethylated DNA, while methylation-sensitive nucleases, such asmethylation-sensitive restriction enzymes, preferentially cutunmethylated DNA relative to methylated DNA.

In a contacting a subsample with a nuclease, one or more nucleases canbe used. In some embodiments, a subsample is contacted with a pluralityof nucleases. The subsample may be contacted with the nucleasessequentially or simultaneously. Simultaneous use of nucleases may beadvantageous when the nucleases are active under similar conditions(e.g., buffer composition) to avoid unnecessary sample manipulation.Contacting the second subsample with more than one methylation-dependentrestriction enzyme can more completely degrade nonspecificallypartitioned hypermethylated DNA. Similarly, contacting the firstsubsample with more than one methylation-sensitive restriction enzymecan more completely degrade nonspecifically partitioned hypomethylatedand/or unmethylated DNA.

In some embodiments, a methylation-dependent nuclease comprises one ormore of MspJI, LpnPI, FspEI, or McrBC. In some embodiments, at least twomethylation-dependent nucleases are used. In some embodiments, at leastthree methylation-dependent nucleases are used. In some embodiments, themethylation-dependent nuclease comprises FspEI. In some embodiments, themethylation-dependent nuclease comprises FspEI and MspJI, e.g., usedsequentially.

In some embodiments, a methylation-sensitive nuclease comprises one ormore of AatII, AccII, Acil, Aor13HI, Aor15HI, BspT104I, BssHII, BstUI,Cfr10I, Clal, Cpol, Eco52I, HaeII, HapII, HhaI, Hin6I, HpaII, HpyCH4IV,MluI, MspI, NaeI, NotI, NruI, NsbI, PmaCI, Psp1406I, PvuI, SacII, SalI,SmaI, and SnaBI. In some embodiments, at least two methylation-sensitivenucleases are used. In some embodiments, at least threemethylation-sensitive nucleases are used. In some embodiments, themethylation-sensitive nucleases comprise BstUI and HpaII. In someembodiments, the two methylation-sensitive nucleases comprise HhaI andAccII. In some embodiments, the methylation-sensitive nucleases compriseBstUI, HpaII and Hin6I.

In some embodiments, FspEI is used for digesting the nucleic acidmolecules in at least one subsample (e.g., a hypomethylated partition).In some embodiments, BstUI, HpaII and Hin6I are used for digesting thenucleic acid molecules in at least one subsample (e.g., ahypermethylated partition) and FspEI is used for digesting the nucleicacid molecules in at least one other subsample (e.g., a hypomethylatedpartition). In embodiments involving an intermediately methylatedpartition, the nucleic acid molecules therein may be digested with amethylation-sensitive nuclease or a methylation-dependent nuclease. Insome embodiments, the nucleic acid molecules in an intermediatelymethylated partition are digested with the same nuclease(s) as thehypermethylated partition. For example, the intermediately methylatedpartition may be pooled with the hypermethylated partition and then thepooled partitions may be subjected to digestion. In some embodiments,the nucleic acid molecules in an intermediately methylated partition aredigested with the same nuclease(s) as the hypomethylated partition. Forexample, the intermediately methylated partition may be pooled with thehypomethylated partition and then the pooled partitions may be subjectedto digestion.

In some embodiments, a subsample is contacted with a nuclease asdescribed above after a step of tagging or attaching adapters to bothends of the DNA. The tags or adapters can be resistant to cleavage bythe nuclease using any of the approaches described above. In thisapproach, cleavage can prevent the nonspecifically partitioned moleculefrom being carried through the analysis because the cleavage productslack tags or adapters at both ends.

Alternatively, a step of tagging or attaching adapters can be performedafter cleavage with a nuclease as described above. Cleaved molecules canbe then identified in sequence reads based on having an end (point ofattachment to tag or adapter) corresponding to a nuclease recognitionsite. Processing the molecules in this way can also allow theacquisition of information from the cleaved molecule, e.g., observationof somatic mutations. When tagging or attaching adapters aftercontacting the subsample with a nuclease, and low molecular weight DNAsuch as cfDNA is being analyzed, it may be desirable to remove highmolecular weight DNA (such as contaminating genomic DNA) from the samplebefore the contacting step. It may also be desirable to use nucleasesthat can be heat-inactivated at a relatively low temperature (e.g., 65°C. or less, or 60° C. or less) to avoid denaturing DNA, in thatdenaturation may interfere with subsequent ligation steps.

Where a sample is partitioned into three subsamples, including a thirdsubsample containing intermediately methylated molecules, the thirdsubsample is in some embodiments contacted with a methylation-sensitivenuclease. Such a step may have any of the features described elsewhereherein with respect to contacting steps, and may be performed before orafter a step of tagging or attaching adapters as discussed above. Insome embodiments, the first and third subsamples are combined beforebeing contacted with a methylation-sensitive nuclease. Such a step mayhave any of the features described elsewhere herein with respect tocontacting steps, and may be performed before or after a step of taggingor attaching adapters as discussed above. In some embodiments, the firstand third subsamples are differentially tagged before being combined.

Alternatively, where a sample is partitioned into three subsamples,including a third subsample containing intermediately methylatedmolecules, the third subsample is in some embodiments contacted with amethylation-dependent nuclease. Such a step may have any of the featuresdescribed elsewhere herein with respect to contacting steps, and may beperformed before or after a step of tagging or attaching adapters asdiscussed above. In some embodiments, the second and third subsamplesare combined before being contacted with a methylation-dependentnuclease. Such a step may have any of the features described elsewhereherein with respect to contacting steps, and may be performed before orafter a step of tagging or attaching adapters as discussed above. Insome embodiments, the second and third subsamples are differentiallytagged before being combined.

In some embodiments, the DNA is purified after being contacted with thenuclease, e.g., using SPRI beads. Such purification may occur after heatinactivation of the nuclease. Alternatively, purification can beomitted; thus, for example, a subsequent step such as amplification canbe performed on the subsample containing heat-inactivated nuclease. Inanother embodiment, the contacting step can occur in the presence of apurification reagent such as SPRI beads, e.g., to minimize lossesassociated with tube transfers. After cleavage and heat inactivation,the SPRI beads can be re-used for cleanup by adding molecular crowdingreagents (e.g., PEG) and salt.

4. Subjecting the First Subsample to a Procedure that Affects a FirstNucleobase in the DNA Differently from a Second Nucleobase in the DNA ofthe First Subsample

Methods disclosed herein may comprise a step of subjecting the firstsubsample to a procedure that affects a first nucleobase in the DNAdifferently from a second nucleobase in the DNA of the first subsample,wherein the first nucleobase is a modified or unmodified nucleobase, thesecond nucleobase is a modified or unmodified nucleobase different fromthe first nucleobase, and the first nucleobase and the second nucleobasehave the same base pairing specificity (e.g., while the second subsampleis contacted with a methylation-dependent nuclease according to any ofthe embodiments described elsewhere herein). In some embodiments, if thefirst nucleobase is a modified or unmodified adenine, then the secondnucleobase is a modified or unmodified adenine; if the first nucleobaseis a modified or unmodified cytosine, then the second nucleobase is amodified or unmodified cytosine; if the first nucleobase is a modifiedor unmodified guanine, then the second nucleobase is a modified orunmodified guanine; and if the first nucleobase is a modified orunmodified thymine, then the second nucleobase is a modified orunmodified thymine (where modified and unmodified uracil are encompassedwithin modified thymine for the purpose of this step). Such a procedurecan be used to identify nucleotides in the subsample that have or lackcertain modifications, such as methylation.

In some embodiments, the first nucleobase is a modified or unmodifiedcytosine, then the second nucleobase is a modified or unmodifiedcytosine. For example, first nucleobase may comprise unmodified cytosine(C) and the second nucleobase may comprise one or more of5-methylcytosine (mC) and 5-hydroxymethylcytosine (hmC). Alternatively,the second nucleobase may comprise C and the first nucleobase maycomprise one or more of mC and hmC. Other combinations are alsopossible, as indicated, e.g., in the Summary above and the followingdiscussion, such as where one of the first and second nucleobasescomprises mC and the other comprises hmC.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises bisulfite conversion. Treatment with bisulfiteconverts unmodified cytosine and certain modified cytosine nucleotides(e.g. 5-formyl cytosine (fC) or 5-carboxylcytosine (caC)) to uracilwhereas other modified cytosines (e.g., 5-methylcytosine,5-hydroxylmethylcystosine) are not converted. Thus, where bisulfiteconversion is used, the first nucleobase comprises one or more ofunmodified cytosine, 5-formyl cytosine, 5-carboxylcytosine, or othercytosine forms affected by bisulfite, and the second nucleobase maycomprise one or more of mC and hmC, such as mC and optionally hmC.Sequencing of bisulfite-treated DNA identifies positions that are readas cytosine as being mC or hmC positions. Meanwhile, positions that areread as T are identified as being T or a bisulfite-susceptible form ofC, such as unmodified cytosine, 5-formyl cytosine, or5-carboxylcytosine. Performing bisulfite conversion on a first subsampleas described herein thus facilitates identifying positions containing mCor hmC using the sequence reads obtained from the first subsample. Foran exemplary description of bisulfite conversion, see, e.g., Moss etal., Nat Commun. 2018; 9: 5068.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises oxidative bisulfite (Ox-BS) conversion. Thisprocedure first converts hmC to fC, which is bisulfite susceptible,followed by bisulfite conversion. Thus, when oxidative bisulfiteconversion is used, the first nucleobase comprises one or more ofunmodified cytosine, fC, caC, hmC, or other cytosine forms affected bybisulfite, and the second nucleobase comprises mC. Sequencing of Ox-BSconverted DNA identifies positions that are read as cytosine as being mCpositions. Meanwhile, positions that are read as T are identified asbeing T, hmC, or a bisulfite-susceptible form of C, such as unmodifiedcytosine, fC, or hmC. Performing Ox-BS conversion on a first subsampleas described herein thus facilitates identifying positions containing mCusing the sequence reads obtained from the first subsample. For anexemplary description of oxidative bisulfite conversion, see, e.g.,Booth et al., Science 2012; 336: 934-937.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises Tet-assisted bisulfite (TAB) conversion. In TABconversion, hmC is protected from conversion and mC is oxidized inadvance of bisulfite treatment, so that positions originally occupied bymC are converted to U while positions originally occupied by hmC remainas a protected form of cytosine. For example, as described in Yu et al.,Cell 2012; 149: 1368-80, β-glucosyl transferase can be used to protecthmC (forming 5-glucosylhydroxymethylcytosine (ghmC)), then a TET proteinsuch as mTet1 can be used to convert mC to caC, and then bisulfitetreatment can be used to convert C and caC to U while ghmC remainsunaffected. Thus, when TAB conversion is used, the first nucleobasecomprises one or more of unmodified cytosine, fC, caC, mC, or othercytosine forms affected by bisulfite, and the second nucleobasecomprises hmC. Sequencing of TAB-converted DNA identifies positions thatare read as cytosine as being hmC positions. Meanwhile, positions thatare read as T are identified as being T, mC, or a bisulfite-susceptibleform of C, such as unmodified cytosine, fC, or caC. Performing TABconversion on a first subsample as described herein thus facilitatesidentifying positions containing hmC using the sequence reads obtainedfrom the first subsample.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises Tet-assisted conversion with a substituted boranereducing agent, optionally wherein the substituted borane reducing agentis 2-picoline borane, borane pyridine, tert-butylamine borane, orammonia borane. In Tet-assisted pic-borane conversion with a substitutedborane reducing agent conversion, a TET protein is used to convert mCand hmC to caC, without affecting unmodified C. caC, and fC if present,are then converted to dihydrouracil (DHU) by treatment with 2-picolineborane (pic-borane) or another substituted borane reducing agent such asborane pyridine, tert-butylamine borane, or ammonia borane, also withoutaffecting unmodified C. See, e.g., Liu et al., Nature Biotechnology2019; 37:424-429 (e.g., at Supplementary FIG. 1 and Supplementary Note7). DHU is read as a T in sequencing. Thus, when this type of conversionis used, the first nucleobase comprises one or more of mC, fC, caC, orhmC, and the second nucleobase comprises unmodified cytosine. Sequencingof the converted DNA identifies positions that are read as cytosine asbeing unmodified C positions. Meanwhile, positions that are read as Tare identified as being T, mC, fC, caC, or hmC. Performing TAPconversion on a first subsample as described herein thus facilitatesidentifying positions containing unmodified C using the sequence readsobtained from the first subsample. This procedure encompassesTet-assisted pyridine borane sequencing (TAPS), described in furtherdetail in Liu et al. 2019, supra.

Alternatively, protection of hmC (e.g., using βGT) can be combined withTet-assisted conversion with a substituted borane reducing agent. hmCcan be protected as noted above through glucosylation using βGT, formingghmC. Treatment with a TET protein such as mTet1 then converts mC to caCbut does not convert C or ghmC. caC is then converted to DHU bytreatment with pic-borane or another substituted borane reducing agentsuch as borane pyridine, tert-butylamine borane, or ammonia borane, alsowithout affecting unmodified C or ghmC. Thus, when Tet-assistedconversion with a substituted borane reducing agent is used, the firstnucleobase comprises mC, and the second nucleobase comprises one or moreof unmodified cytosine or hmC, such as unmodified cytosine andoptionally hmC, fC, and/or caC. Sequencing of the converted DNAidentifies positions that are read as cytosine as being either hmC orunmodified C positions. Meanwhile, positions that are read as T areidentified as being T, fC, caC, or mC. Performing TAPSβ conversion on afirst subsample as described herein thus facilitates distinguishingpositions containing unmodified C or hmC on the one hand from positionscontaining mC using the sequence reads obtained from the firstsubsample. For an exemplary description of this type of conversion, see,e.g., Liu et al., Nature Biotechnology 2019; 37:424-429.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises chemical-assisted conversion with a substitutedborane reducing agent, optionally wherein the substituted boranereducing agent is 2-picoline borane, borane pyridine, tert-butylamineborane, or ammonia borane. In chemical-assisted conversion with asubstituted borane reducing agent, an oxidizing agent such as potassiumperruthenate (KRuO₄) (also suitable for use in ox-BS conversion) is usedto specifically oxidize hmC to fC. Treatment with pic-borane or anothersubstituted borane reducing agent such as borane pyridine,tert-butylamine borane, or ammonia borane converts fC and caC to DHU butdoes not affect mC or unmodified C. Thus, when this type of conversionis used, the first nucleobase comprises one or more of hmC, fC, and caC,and the second nucleobase comprises one or more of unmodified cytosineor mC, such as unmodified cytosine and optionally mC. Sequencing of theconverted DNA identifies positions that are read as cytosine as beingeither mC or unmodified C positions. Meanwhile, positions that are readas T are identified as being T, fC, caC, or hmC. Performing this type ofconversion on a first subsample as described herein thus facilitatesdistinguishing positions containing unmodified C or mC on the one handfrom positions containing hmC using the sequence reads obtained from thefirst subsample. For an exemplary description of this type ofconversion, see, e.g., Liu et al., Nature Biotechnology 2019;37:424-429.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises APOBEC-coupled epigenetic (ACE) conversion. In ACEconversion, an AID/APOBEC family DNA deaminase enzyme such as APOBEC3A(A3A) is used to deaminate unmodified cytosine and mC withoutdeaminating hmC, fC, or caC. Thus, when ACE conversion is used, thefirst nucleobase comprises unmodified C and/or mC (e.g., unmodified Cand optionally mC), and the second nucleobase comprises hmC. Sequencingof ACE-converted DNA identifies positions that are read as cytosine asbeing hmC, fC, or caC positions. Meanwhile, positions that are read as Tare identified as being T, unmodified C, or mC. Performing ACEconversion on a first subsample as described herein thus facilitatesdistinguishing positions containing hmC from positions containing mC orunmodified C using the sequence reads obtained from the first subsample.For an exemplary description of ACE conversion, see, e.g., Schutsky etal., Nature Biotechnology 2018; 36: 1083-1090.

In some embodiments, procedure that affects a first nucleobase in theDNA differently from a second nucleobase in the DNA of the firstsubsample comprises enzymatic conversion of the first nucleobase, e.g.,as in EM-Seq. See, e.g., Vaisvila R, et al. (2019) EM-seq: Detection ofDNA methylation at single base resolution from picograms of DNA.bioRxiv; DOI: 10.1101/2019.12.20.884692, available atwww.biorxiv.org/content/10.1101/2019.12.20.884692v. For example, TET2and T4-βGT can be used to convert 5mC and 5hmC into substrates thatcannot be deaminated by a deaminase (e.g., APOBEC3A), and then adeaminase (e.g., APOBEC3A) can be used to deaminate unmodified cytosinesconverting them to uracils.

In some embodiments, the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample comprises separating DNA originally comprising the firstnucleobase from DNA not originally comprising the first nucleobase. Insome such embodiments, the first nucleobase is hmC. DNA originallycomprising the first nucleobase may be separated from other DNA using alabeling procedure comprising biotinylating positions that originallycomprised the first nucleobase. In some embodiments, the firstnucleobase is first derivatized with an azide-containing moiety, such asa glucosyl-azide containing moiety. The azide-containing moiety then mayserve as a reagent for attaching biotin, e.g., through Huisgencycloaddition chemistry. Then, the DNA originally comprising the firstnucleobase, now biotinylated, can be separated from DNA not originallycomprising the first nucleobase using a biotin-binding agent, such asavidin, neutravidin (deglycosylated avidin with an isoelectric point ofabout 6.3), or streptavidin. An example of a procedure for separatingDNA originally comprising the first nucleobase from DNA not originallycomprising the first nucleobase is hmC-seal, which labels hmC to formβ-6-azide-glucosyl-5-hydroxymethylcytosine and then attaches a biotinmoiety through Huisgen cycloaddition, followed by separation of thebiotinylated DNA from other DNA using a biotin-binding agent. For anexemplary description of hmC-seal, see, e.g., Han et al., Mol. Cell2016; 63: 711-719. This approach is useful for identifying fragmentsthat include one or more hmC nucleobases.

In some embodiments, following such a separation, the method furthercomprises differentially tagging each of the DNA originally comprisingthe first nucleobase, the DNA not originally comprising the firstnucleobase, and the DNA of the second subsample. The method may furthercomprise pooling the DNA originally comprising the first nucleobase, theDNA not originally comprising the first nucleobase, and the DNA of thesecond subsample following differential tagging. The DNA originallycomprising the first nucleobase, the DNA not originally comprising thefirst nucleobase, and the DNA of the second subsample may then besequenced in the same sequencing cell while retaining the ability toresolve whether a given read came from a molecule of DNA originallycomprising the first nucleobase, DNA not originally comprising the firstnucleobase, or DNA of the second subsample using the differential tags.

In some embodiments, the first nucleobase is a modified or unmodifiedadenine, and the second nucleobase is a modified or unmodified adenine.In some embodiments, the modified adenine is N⁶-methyladenine (mA). Insome embodiments, the modified adenine is one or more ofN⁶-methyladenine (mA), N⁶-hydroxymethyladenine (hmA), orN⁶-formyladenine (fA).

Techniques comprising methylated DNA immunoprecipitation (MeDIP) can beused to separate DNA containing modified bases such as mA from otherDNA. See, e.g., Kumar et al., Frontiers Genet. 2018; 9: 640; Greer etal., Cell 2015; 161: 868-878. An antibody specific for mA is describedin Sun et al., Bioessays 2015; 37:1155-62. Antibodies for variousmodified nucleobases, such as forms of thymine/uracil includinghalogenated forms such as 5-bromouracil, are commercially available.Various modified bases can also be detected based on alterations intheir base-pairing specificity. For example, hypoxanthine is a modifiedform of adenine that can result from deamination and is read insequencing as a G. See, e.g., U.S. Pat. No. 8,486,630; Brown, Genomes,2nd Ed., John Wiley & Sons, Inc., New York, N.Y., 2002, chapter 14,“Mutation, Repair, and Recombination.”

5. Enriching/Capturing Step; Amplification; Adaptors; Barcodes

In some embodiments, methods disclosed herein comprise a step ofcapturing one or more sets of target regions of DNA, such as cfDNA.Capture may be performed using any suitable approach known in the art.

In some embodiments, capturing comprises contacting the DNA to becaptured with a set of target-specific probes. The set oftarget-specific probes may have any of the features described herein forsets of target-specific probes, including but not limited to in theembodiments set forth above and the sections relating to probes below.Capturing may be performed on one or more subsamples prepared duringmethods disclosed herein. In some embodiments, DNA is captured from atleast the first subsample or the second subsample, e.g., at least thefirst subsample and the second subsample. Where the first subsampleundergoes a separation step (e.g., separating DNA originally comprisingthe first nucleobase (e.g., hmC) from DNA not originally comprising thefirst nucleobase, such as hmC-seal), capturing may be performed on any,any two, or all of the DNA originally comprising the first nucleobase(e.g., hmC), the DNA not originally comprising the first nucleobase, andthe second subsample. In some embodiments, the subsamples aredifferentially tagged (e.g., as described herein) and then pooled beforeundergoing capture.

The capturing step may be performed using conditions suitable forspecific nucleic acid hybridization, which generally depend to someextent on features of the probes such as length, base composition, etc.Those skilled in the art will be familiar with appropriate conditionsgiven general knowledge in the art regarding nucleic acid hybridization.In some embodiments, complexes of target-specific probes and DNA areformed.

In some embodiments, a method described herein comprises capturing cfDNAobtained from a test subject for a plurality of sets of target regions.The target regions comprise epigenetic target regions, which may showdifferences in methylation levels and/or fragmentation patternsdepending on whether they originated from a tumor or from healthy cells.The target regions also comprise sequence-variable target regions, whichmay show differences in sequence depending on whether they originatedfrom a tumor or from healthy cells. The capturing step produces acaptured set of cfDNA molecules, and the cfDNA molecules correspondingto the sequence-variable target region set are captured at a greatercapture yield in the captured set of cfDNA molecules than cfDNAmolecules corresponding to the epigenetic target region set. Foradditional discussion of capturing steps, capture yields, and relatedaspects, see WO2020/160414, which is incorporated herein by referencefor all purposes.

In some embodiments, a method described herein comprises contactingcfDNA obtained from a test subject with a set of target-specific probes,wherein the set of target-specific probes is configured to capture cfDNAcorresponding to the sequence-variable target region set at a greatercapture yield than cfDNA corresponding to the epigenetic target regionset.

It can be beneficial to capture cfDNA corresponding to thesequence-variable target region set at a greater capture yield thancfDNA corresponding to the epigenetic target region set because agreater depth of sequencing may be necessary to analyze thesequence-variable target regions with sufficient confidence or accuracythan may be necessary to analyze the epigenetic target regions. Thevolume of data needed to determine fragmentation patterns (e.g., to testfor perturbation of transcription start sites or CTCF binding sites) orfragment abundance (e.g., in hypermethylated and hypomethylatedpartitions) is generally less than the volume of data needed todetermine the presence or absence of cancer-related sequence mutations.Capturing the target region sets at different yields can facilitatesequencing the target regions to different depths of sequencing in thesame sequencing run (e.g., using a pooled mixture and/or in the samesequencing cell).

In various embodiments, the methods further comprise sequencing thecaptured cfDNA, e.g., to different degrees of sequencing depth for theepigenetic and sequence-variable target region sets, consistent with thediscussion herein.

In some embodiments, complexes of target-specific probes and DNA areseparated from DNA not bound to target-specific probes. For example,where target-specific probes are bound covalently or noncovalently to asolid support, a washing or aspiration step can be used to separateunbound material. Alternatively, where the complexes havechromatographic properties distinct from unbound material (e.g., wherethe probes comprise a ligand that binds a chromatographic resin),chromatography can be used.

As discussed in detail elsewhere herein, the set of target-specificprobes may comprise a plurality of sets such as probes for asequence-variable target region set and probes for an epigenetic targetregion set. In some such embodiments, the capturing step is performedwith the probes for the sequence-variable target region set and theprobes for the epigenetic target region set in the same vessel at thesame time, e.g., the probes for the sequence-variable and epigenetictarget region sets are in the same composition. This approach provides arelatively streamlined workflow. In some embodiments, the concentrationof the probes for the sequence-variable target region set is greaterthat the concentration of the probes for the epigenetic target regionset.

Alternatively, the capturing step is performed with thesequence-variable target region probe set in a first vessel and with theepigenetic target region probe set in a second vessel, or the contactingstep is performed with the sequence-variable target region probe set ata first time and a first vessel and the epigenetic target region probeset at a second time before or after the first time. This approachallows for preparation of separate first and second compositionscomprising captured DNA corresponding to the sequence-variable targetregion set and captured DNA corresponding to the epigenetic targetregion set. The compositions can be processed separately as desired(e.g., to fractionate based on methylation as described elsewhereherein) and recombined in appropriate proportions to provide materialfor further processing and analysis such as sequencing.

In some embodiments, the DNA is amplified. In some embodiments,amplification is performed before the capturing step. In someembodiments, amplification is performed after the capturing step.

In some embodiments, adapters are included in the DNA. This may be doneconcurrently with an amplification procedure, e.g., by providing theadapters in a 5′ portion of a primer, e.g., as described above.Alternatively, adapters can be added by other approaches, such asligation.

In some embodiments, tags, which may be or include barcodes, areincluded in the DNA. Tags can facilitate identification of the origin ofa nucleic acid. For example, barcodes can be used to allow the origin(e.g., subject) whence the DNA came to be identified following poolingof a plurality of samples for parallel sequencing. This may be doneconcurrently with an amplification procedure, e.g., by providing thebarcodes in a 5′ portion of a primer, e.g., as described above. In someembodiments, adapters and tags/barcodes are provided by the same primeror primer set. For example, the barcode may be located 3′ of the adapterand 5′ of the target-hybridizing portion of the primer. Alternatively,barcodes can be added by other approaches, such as ligation, optionallytogether with adapters in the same ligation substrate.

Additional details regarding amplification, tags, and barcodes arediscussed in the “General Features of the Methods” section below, whichcan be combined to the extent practicable with any of the foregoingembodiments and the embodiments set forth in the introduction andsummary section.

6. Captured Set

In some embodiments, a captured set of DNA (e.g., cfDNA) is provided.With respect to the disclosed methods, the captured set of DNA may beprovided, e.g., by performing a capturing step after a partitioning stepas described herein. The captured set may comprise DNA corresponding toa sequence-variable target region set, an epigenetic target region set,or a combination thereof.

In some embodiments, a first target region set is captured from thefirst subsample, comprising at least epigenetic target regions. Theepigenetic target regions captured from the first subsample may comprisehypermethylation variable target regions. In some embodiments, thehypermethylation variable target regions are CpG-containing regions thatare unmethylated or have low methylation in cfDNA from healthy subjects(e.g., below-average methylation relative to bulk cfDNA). In someembodiments, the hypermethylation variable target regions are regionsthat show lower methylation in healthy cfDNA than in at least one othertissue type. Without wishing to be bound by any particular theory,cancer cells may shed more DNA into the bloodstream than healthy cellsof the same tissue type. As such, the distribution of tissue of originof cfDNA may change upon carcinogenesis. Thus, an increase in the levelof hypermethylation variable target regions in the first subsample canbe an indicator of the presence (or recurrence, depending on the historyof the subject) of cancer.

In some embodiments, a second target region set is captured from thesecond subsample, comprising at least epigenetic target regions. Theepigenetic target regions may comprise hypomethylation variable targetregions. In some embodiments, the hypomethylation variable targetregions are CpG-containing regions that are methylated or have highmethylation in cfDNA from healthy subjects (e.g., above-averagemethylation relative to bulk cfDNA). In some embodiments, thehypomethylation variable target regions are regions that show highermethylation in healthy cfDNA than in at least one other tissue type.Without wishing to be bound by any particular theory, cancer cells mayshed more DNA into the bloodstream than healthy cells of the same tissuetype. As such, the distribution of tissue of origin of cfDNA may changeupon carcinogenesis. Thus, an increase in the level of hypomethylationvariable target regions in the second subsample can be an indicator ofthe presence (or recurrence, depending on the history of the subject) ofcancer.

In some embodiments the quantity of captured sequence-variable targetregion DNA is greater than the quantity of the captured epigenetictarget region DNA, when normalized for the difference in the size of thetargeted regions (footprint size).

Alternatively, first and second captured sets may be provided,comprising, respectively, DNA corresponding to a sequence-variabletarget region set and DNA corresponding to an epigenetic target regionset. The first and second captured sets may be combined to provide acombined captured set.

In some embodiments in which a captured set comprising DNA correspondingto the sequence-variable target region set and the epigenetic targetregion set includes a combined captured set as discussed above, the DNAcorresponding to the sequence-variable target region set may be presentat a greater concentration than the DNA corresponding to the epigenetictarget region set, e.g., a 1.1 to 1.2-fold greater concentration, a 1.2-to 1.4-fold greater concentration, a 1.4- to 1.6-fold greaterconcentration, a 1.6- to 1.8-fold greater concentration, a 1.8- to2.0-fold greater concentration, a 2.0- to 2.2-fold greaterconcentration, a 2.2- to 2.4-fold greater concentration a 2.4- to2.6-fold greater concentration, a 2.6- to 2.8-fold greaterconcentration, a 2.8- to 3.0-fold greater concentration, a 3.0- to3.5-fold greater concentration, a 3.5- to 4.0, a 4.0- to 4.5-foldgreater concentration, a 4.5- to 5.0-fold greater concentration, a 5.0-to 5.5-fold greater concentration, a 5.5- to 6.0-fold greaterconcentration, a 6.0- to 6.5-fold greater concentration, a 6.5- to7.0-fold greater, a 7.0- to 7.5-fold greater concentration, a 7.5- to8.0-fold greater concentration, an 8.0- to 8.5-fold greaterconcentration, an 8.5- to 9.0-fold greater concentration, a 9.0- to9.5-fold greater concentration, 9.5- to 10.0-fold greater concentration,a 10- to 11-fold greater concentration, an 11- to 12-fold greaterconcentration a 12- to 13-fold greater concentration, a 13- to 14-foldgreater concentration, a 14- to 15-fold greater concentration, a 15- to16-fold greater concentration, a 16- to 17-fold greater concentration, a17- to 18-fold greater concentration, an 18- to 19-fold greaterconcentration, a 19- to 20-fold greater concentration, a 20- to 30-foldgreater concentration, a 30- to 40-fold greater concentration, a 40- to50-fold greater concentration, a 50- to 60-fold greater concentration, a60- to 70-fold greater concentration, a 70- to 80-fold greaterconcentration, a 80- to 90-fold greater concentration, or a 90- to100-fold greater concentration. The degree of difference inconcentrations accounts for normalization for the footprint sizes of thetarget regions, as discussed in the definition section.

a. Epigenetic Target Region Set

The epigenetic target region set may comprise one or more types oftarget regions likely to differentiate DNA from neoplastic (e.g., tumoror cancer) cells and from healthy cells, e.g., non-neoplasticcirculating cells. Exemplary types of such regions are discussed indetail herein. The epigenetic target region set may also comprise one ormore control regions, e.g., as described herein.

In some embodiments, the epigenetic target region set has a footprint ofat least 100 kbp, e.g., at least 200 kbp, at least 300 kbp, or at least400 kbp. In some embodiments, the epigenetic target region set has afootprint in the range of 100-20 Mbp, e.g., 100-200 kbp, 200-300 kbp,300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900kbp, 900-1,000 kbp, 1-1.5 Mbp, 1.5-2 Mbp, 2-3 Mbp, 3-4 Mbp, 4-5 Mbp, 5-6Mbp, 6-7 Mbp, 7-8 Mbp, 8-9 Mbp, 9-10 Mbp, or 10-20 Mbp. In someembodiments, the epigenetic target region set has a footprint of atleast 20 Mbp.

i. Hypermethylation Variable Target Regions

In some embodiments, the epigenetic target region set comprises one ormore hypermethylation variable target regions. In general,hypermethylation variable target regions refer to regions where anincrease in the level of observed methylation, e.g., in a cfDNA sample,indicates an increased likelihood that a sample (e.g., of cfDNA)contains DNA produced by neoplastic cells, such as tumor or cancercells. For example, hypermethylation of promoters of tumor suppressorgenes has been observed repeatedly. See, e.g., Kang et al., Genome Biol.18:53 (2017) and references cited therein. In another example, asdiscussed above, hypermethylation variable target regions can includeregions that do not necessarily differ in methylation in canceroustissue relative to DNA from healthy tissue of the same type, but dodiffer in methylation (e.g., have more methylation) relative to cfDNAthat is typical in healthy subjects. Where, for example, the presence ofa cancer results in increased cell death such as apoptosis of cells ofthe tissue type corresponding to the cancer, such a cancer can bedetected at least in part using such hypermethylation variable targetregions.

An extensive discussion of methylation variable target regions incolorectal cancer is provided in Lam et al., Biochim Biophys Acta.1866:106-20 (2016). These include VIM, SEPT9, ITGA4, OSM4, GATA4 andNDRG4. An exemplary set of hypermethylation variable target regionsbased on colorectal cancer (CRC) studies is provided in Table 1. Many ofthese genes likely have relevance to cancers beyond colorectal cancer;for example, TP53 is widely recognized as a critically important tumorsuppressor and hypermethylation-based inactivation of this gene may be acommon oncogenic mechanism.

TABLE 1 Exemplary Hypermethylation Target Regions based on CRC studies.Additional Gene Gene Name Name Chromosome VIM chr10 SEP19 chr17 CYCD2CCND2 chr12 TFPI2 chr7 GATA4 chr8 RARB2 RARB chr3 p16INK4a CDKN2A chr9MGMT MGMT chr10 APC chr5 NDRG4 chr16 HLTF chr3 HPP1 TMEFF2 chr2 hMLH1MLH1 chr3 RASSF1A RASSF1 chr3 CDH13 chr16 IGFBP3 chr7 ITGA4 chr2

In some embodiments, the hypermethylation variable target regionscomprise a plurality of loci listed in Table 1, e.g., at least 10%, 20%,30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed inTable 1. For example, for each locus included as a target region, theremay be one or more probes with a hybridization site that binds betweenthe transcription start site and the stop codon (the last stop codon forgenes that are alternatively spliced) of the gene, or in the promoterregion of the gene. In some embodiments, the one or more probes bindwithin 300 bp of the transcription start site of a gene in Table 1,e.g., within 200 or 100 bp.

Methylation variable target regions in various types of lung cancer arediscussed in detail, e.g., in Ooki et al., Clin. Cancer Res. 23:7141-52(2017); Belinksy, Annu. Rev. Physiol. 77:453-74 (2015); Hulbert et al.,Clin. Cancer Res. 23:1998-2005 (2017); Shi et al., BMC Genomics 18:901(2017); Schneider et al., BMC Cancer. 11:102 (2011); Lissa et al.,Transl Lung Cancer Res 5(5):492-504 (2016); Skvortsova et al., Br. J.Cancer. 94(10):1492-1495 (2006); Kim et al., Cancer Res. 61:3419-3424(2001); Furonaka et al., Pathology International 55:303-309 (2005);Gomes et al., Rev. Port. Pneumol. 20:20-30 (2014); Kim et al., Oncogene.20:1765-70 (2001); Hopkins-Donaldson et al., Cell Death Differ.10:356-64 (2003); Kikuchi et al., Clin. Cancer Res. 11:2954-61 (2005);Heller et al., Oncogene 25:959-968 (2006); Licchesi et al.,Carcinogenesis. 29:895-904 (2008); Guo et al., Clin. Cancer Res.10:7917-24 (2004); Palmisano et al., Cancer Res. 63:4620-4625 (2003);and Toyooka et al., Cancer Res. 61:4556-4560, (2001).

An exemplary set of hypermethylation variable target regions based onlung cancer studies is provided in Table 2. Many of these genes likelyhave relevance to cancers beyond lung cancer; for example, Casp8(Caspase 8) is a key enzyme in programmed cell death andhypermethylation-based inactivation of this gene may be a commononcogenic mechanism not limited to lung cancer. Additionally, a numberof genes appear in both Tables 1 and 2, indicating generality.

TABLE 2 Exemplary Hypermethylation Target Regions based on Lung Cancerstudies Gene Name Chromosome MARCH11 chr5 TAC1 chr7 TCF21 chr6 SHOX2chr3 p16 chr3 Casp8 chr2 CDH13 chr16 MGMT chr10 MLH1 chr3 MSH2 chr2TSLC1 chr11 APC chr5 DKK1 chr10 DKK3 chr11 LKB1 chr11 WIF1 chr12 RUNX3chr1 GATA4 chr8 GATA5 chr20 PAX5 chr9 E-Cadherin chr16 H-Cadherin chr16

Any of the foregoing embodiments concerning target regions identified inTable 2 may be combined with any of the embodiments described aboveconcerning target regions identified in Table 1. In some embodiments,the hypermethylation variable target regions comprise a plurality ofloci listed in Table 1 or Table 2, e.g., at least 10%, 20%, 30%, 40%,50%, 60%, 70%, 80%, 90%, or 100% of the loci listed in Table 1 or Table2.

Additional hypermethylation target regions may be obtained, e.g., fromthe Cancer Genome Atlas. Kang et al., Genome Biology 18:53 (2017),describe construction of a probabilistic method called CancerLocatorusing hypermethylation target regions from breast, colon, kidney, liver,and lung. In some embodiments, the hypermethylation target regions canbe specific to one or more types of cancer. Accordingly, in someembodiments, the hypermethylation target regions include one, two,three, four, or five subsets of hypermethylation target regions thatcollectively show hypermethylation in one, two, three, four, or five ofbreast, colon, kidney, liver, and lung cancers.

In some embodiments, where different epigenetic target regions arecaptured from the first and second subsamples, the epigenetic targetregions captured from the first subsample comprise hypermethylationvariable target regions.

ii. Hypomethylation Variable Target Regions

Global hypomethylation is a commonly observed phenomenon in variouscancers. See, e.g., Hon et al., Genome Res. 22:246-258 (2012) (breastcancer); Ehrlich, Epigenomics 1:239-259 (2009) (review article notingobservations of hypomethylation in colon, ovarian, prostate, leukemia,hepatocellular, and cervical cancers). For example, regions such asrepeated elements, e.g., LINE1 elements, Alu elements, centromerictandem repeats, pericentromeric tandem repeats, and satellite DNA, andintergenic regions that are ordinarily methylated in healthy cells mayshow reduced methylation in tumor cells. Accordingly, in someembodiments, the epigenetic target region set includes hypomethylationvariable target regions, where a decrease in the level of observedmethylation indicates an increased likelihood that a sample (e.g., ofcfDNA) contains DNA produced by neoplastic cells, such as tumor orcancer cells. In another example, as discussed above, hypomethylationvariable target regions can include regions that do not necessarilydiffer in methylation in cancerous tissue relative to DNA from healthytissue of the same type, but do differ in methylation (e.g., are lessmethylated) relative to cfDNA that is typical in healthy subjects.Where, for example, the presence of a cancer results in increased celldeath such as apoptosis of cells of the tissue type corresponding to thecancer, such a cancer can be detected at least in part using suchhypomethylation variable target regions.

In some embodiments, hypomethylation variable target regions includerepeated elements and/or intergenic regions. In some embodiments,repeated elements include one, two, three, four, or five of LINE1elements, Alu elements, centromeric tandem repeats, pericentromerictandem repeats, and/or satellite DNA.

Exemplary specific genomic regions that show cancer-associatedhypomethylation include nucleotides 8403565-8953708 and151104701-151106035 of human chromosome 1. In some embodiments, thehypomethylation variable target regions overlap or comprise one or bothof these regions.

In some embodiments, where different epigenetic target regions arecaptured from the first and second subsamples, the epigenetic targetregions captured from the second subsample comprise hypomethylationvariable target regions.

iii. CTCF Binding Regions

CTCF is a DNA-binding protein that contributes to chromatin organizationand often colocalizes with cohesin. Perturbation of CTCF binding siteshas been reported in a variety of different cancers. See, e.g., Katainenet al., Nature Genetics, doi:10.1038/ng.3335, published online 8 Jun.2015; Guo et al., Nat. Commun. 9:1520 (2018). CTCF binding results inrecognizable patterns in cfDNA that can be detected by sequencing, e.g.,through fragment length analysis. Details regarding sequencing-basedfragment length analysis are provided in Snyder et al., Cell 164:57-68(2016); WO 2018/009723; and US20170211143A1, each of which areincorporated herein by reference.

Thus, perturbations of CTCF binding result in variation in thefragmentation patterns of cfDNA. As such, CTCF binding sites represent atype of fragmentation variable target regions.

There are many known CTCF binding sites. See, e.g., the CTCFBSDB (CTCFBinding Site Database), available on the Internet atinsulatordb.uthsc.edu/; Cuddapah et al., Genome Res. 19:24-32 (2009);Martin et al., Nat. Struct. Mol. Biol. 18:708-14 (2011); Rhee et al.,Cell. 147:1408-19 (2011), each of which are incorporated by reference.Exemplary CTCF binding sites are at nucleotides 56014955-56016161 onchromosome 8 and nucleotides 95359169-95360473 on chromosome 13.

Accordingly, in some embodiments, the epigenetic target region setincludes CTCF binding regions. In some embodiments, the CTCF bindingregions comprise at least 10, 20, 50, 100, 200, or 500 CTCF bindingregions, or 10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 CTCFbinding regions, e.g., such as CTCF binding regions described above orin one or more of CTCFBSDB or the Cuddapah et al., Martin et al., orRhee et al. articles cited above.

In some embodiments, at least some of the CTCF sites can be methylatedor unmethylated, wherein the methylation state is correlated with thewhether or not the cell is a cancer cell. In some embodiments, theepigenetic target region set comprises at least 100 bp, at least 200 bp,at least 300 bp, at least 400 bp, at least 500 bp, at least 750 bp, atleast 1000 bp upstream and downstream regions of the CTCF binding sites.

iv. Transcription Start Sites

Transcription start sites may also show perturbations in neoplasticcells. For example, nucleosome organization at various transcriptionstart sites in healthy cells of the hematopoietic lineage—whichcontributes substantially to cfDNA in healthy individuals—may differfrom nucleosome organization at those transcription start sites inneoplastic cells. This results in different cfDNA patterns that can bedetected by sequencing, as discussed generally in Snyder et al., Cell164:57-68 (2016); WO 2018/009723; and US20170211143A1. In anotherexample, transcription start sites that do not necessarily differepigenetically in cancerous tissue relative to DNA from healthy tissueof the same type, but do differ epigenetically (e.g., with respect tonucleosome organization) relative to cfDNA that is typical in healthysubjects. Where, for example, the presence of a cancer results inincreased cell death such as apoptosis of cells of the tissue typecorresponding to the cancer, such a cancer can be detected at least inpart using such transcription start sites.

Thus, perturbations of transcription start sites also result invariation in the fragmentation patterns of cfDNA. As such, transcriptionstart sites also represent a type of fragmentation variable targetregions.

Human transcriptional start sites are available from DBTSS (DataBase ofHuman Transcription Start Sites), available on the Internet atdbtss.hgc.jp and described in Yamashita et al., Nucleic Acids Res. 34(Database issue): D86-D89 (2006), which is incorporated herein byreference.

Accordingly, in some embodiments, the epigenetic target region setincludes transcriptional start sites. In some embodiments, thetranscriptional start sites comprise at least 10, 20, 50, 100, 200, or500 transcriptional start sites, or 10-20, 20-50, 50-100, 100-200,200-500, or 500-1000 transcriptional start sites, e.g., such astranscriptional start sites listed in DBTSS. In some embodiments, atleast some of the transcription start sites can be methylated orunmethylated, wherein the methylation state is correlated with whetheror not the cell is a cancer cell. In some embodiments, the epigenetictarget region set comprises at least 100 bp, at least 200 bp, at least300 bp, at least 400 bp, at least 500 bp, at least 750 bp, at least 1000bp upstream and downstream regions of the transcription start sites.

v. Focal Amplifications

Although focal amplifications are somatic mutations, they can bedetected by sequencing based on read frequency in a manner analogous toapproaches for detecting certain epigenetic changes such as changes inmethylation. As such, regions that may show focal amplifications incancer can be included in the epigenetic target region set and maycomprise one or more of AR, BRAF, CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR,ERBB2, FGFR1, FGFR2, KIT, KRAS, MET, MYC, PDGFRA, PIK3CA, and RAF1. Forexample, in some embodiments, the epigenetic target region set comprisesat least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, or 18of the foregoing targets.

vi. Methylation Control Regions

It can be useful to include control regions to facilitate datavalidation. In some embodiments, the epigenetic target region setincludes control regions that are expected to be methylated orunmethylated in essentially all samples, regardless of whether the DNAis derived from a cancer cell or a normal cell. In some embodiments, theepigenetic target region set includes control hypomethylated regionsthat are expected to be hypomethylated in essentially all samples. Insome embodiments, the epigenetic target region set includes controlhypermethylated regions that are expected to be hypermethylated inessentially all samples.

b. Sequence-Variable Target Region Set

In some embodiments, the sequence-variable target region set comprises aplurality of regions known to undergo somatic mutations in cancer.

In some aspects, the sequence-variable target region set targets aplurality of different genes or genomic regions (“panel”) selected suchthat a determined proportion of subjects having a cancer exhibits agenetic variant or tumor marker in one or more different genes orgenomic regions in the panel. The panel may be selected to limit aregion for sequencing to a fixed number of base pairs. The panel may beselected to sequence a desired amount of DNA, e.g., by adjusting theaffinity and/or amount of the probes as described elsewhere herein. Thepanel may be further selected to achieve a desired sequence read depth.The panel may be selected to achieve a desired sequence read depth orsequence read coverage for an amount of sequenced base pairs. The panelmay be selected to achieve a theoretical sensitivity, a theoreticalspecificity, and/or a theoretical accuracy for detecting one or moregenetic variants in a sample.

Probes for detecting the panel of regions can include those fordetecting genomic regions of interest (hotspot regions) as well asnucleosome-aware probes (e.g., KRAS codons 12 and 13) and may bedesigned to optimize capture based on analysis of cfDNA coverage andfragment size variation impacted by nucleosome binding patterns and GCsequence composition. Regions used herein can also include non-hotspotregions optimized based on nucleosome positions and GC models.

Examples of listings of genomic locations of interest may be found inTable 3 and Table 4. In some embodiments, a sequence-variable targetregion set used in the methods of the present disclosure comprises atleast a portion of at least 5, at least 10, at least 15, at least 20, atleast 25, at least 30, at least 35, at least 40, at least 45, at least50, at least 55, at least 60, at least 65, or 70 of the genes of Table3. In some embodiments, a sequence-variable target region set used inthe methods of the present disclosure comprises at least 5, at least 10,at least 15, at least 20, at least 25, at least 30, at least 35, atleast 40, at least 45, at least 50, at least 55, at least 60, at least65, or 70 of the SNVs of Table 3. In some embodiments, asequence-variable target region set used in the methods of the presentdisclosure comprises at least 1, at least 2, at least 3, at least 4, atleast 5, or 6 of the fusions of Table 3. In some embodiments, asequence-variable target region set used in the methods of the presentdisclosure comprise at least a portion of at least 1, at least 2, or 3of the indels of Table 3. In some embodiments, a sequence-variabletarget region set used in the methods of the present disclosurecomprises at least a portion of at least 5, at least 10, at least 15, atleast 20, at least 25, at least 30, at least 35, at least 40, at least45, at least 50, at least 55, at least 60, at least 65, at least 70, or73 of the genes of Table 4. In some embodiments, a sequence-variabletarget region set used in the methods of the present disclosurecomprises at least 5, at least 10, at least 15, at least 20, at least25, at least 30, at least 35, at least 40, at least 45, at least 50, atleast 55, at least 60, at least 65, at least 70, or 73 of the SNVs ofTable 4. In some embodiments, a sequence-variable target region set usedin the methods of the present disclosure comprises at least 1, at least2, at least 3, at least 4, at least 5, or 6 of the fusions of Table 4.In some embodiments, a sequence-variable target region set used in themethods of the present disclosure comprises at least a portion of atleast 1, at least 2, at least 3, at least 4, at least 5, at least 6, atleast 7, at least 8, at least 9, at least 10, at least 11, at least 12,at least 13, at least 14, at least 15, at least 16, at least 17, or 18of the indels of Table 4. Each of these genomic locations of interestmay be identified as a backbone region or hot-spot region for a givenpanel. An example of a listing of hot-spot genomic locations of interestmay be found in Table 5. In some embodiments, a sequence-variable targetregion set used in the methods of the present disclosure comprises atleast a portion of at least 1, at least 2, at least 3, at least 4, atleast 5, at least 6, at least 7, at least 8, at least 9, at least 10, atleast 11, at least 12, at least 13, at least 14, at least 15, at least16, at least 17, at least 18, at least 19, or at least 20 of the genesof Table 5. Each hot-spot genomic region is listed with severalcharacteristics, including the associated gene, chromosome on which itresides, the start and stop position of the genome representing thegene's locus, the length of the gene's locus in base pairs, the exonscovered by the gene, and the critical feature (e.g., type of mutation)that a given genomic region of interest may seek to capture.

TABLE 3 Point Mutations (SNVs) Fusions AKT1 ALK APC AR ARAF ARID1A ALKATM BRAF BRCA1 BRCA2 CCND1 CCND2 FGFR2 CCNE1 CDH1 CDK4 CDK6 CDKN2ACDKN2B FGFR3 CTNNB1 EGFR ERBB2 ESR1 EZH2 FBXW7 NTRK1 FGFR1 FGFR2 FGFR3GATA3 GNA11 GNAQ RET GNAS HNF1A HRAS IDH1 IDH2 JAK2 ROS1 JAK3 KIT KRASMAP2K1 MAP2K2 MET MLH1 MPL MYC NF1 NFE2L2 NOTCH1 NPM1 NRAS NTRK1 PDGFRAPIK3CA PTEN PTPN11 RAF1 RB1 RET RHEB RHOA RIT1 ROS1 SMAD4 SMO SRC STK11TERT TP53 TSC1 VHL

TABLE 4 Point Mutations (SNVs) Fusions AKT1 ALK APC AR ARAF ARID1A ALKATM BRAF BRCA1 BRCA2 CCND1 CCND2 FGFR2 CCNE1 CDH1 CDK4 CDK6 CDKN2A DDR2FGFR3 CTNNB1 EGFR ERBB2 ESR1 EZH2 FBW7 NTRK1 FGFR1 FGFR2 FGFR3 GATA3GNA11 GNAQ RET GNAS HNF1A HRAS IDH1 IDH2 JAK2 ROS1 JAK3 KIT KRAS MAP2K1MAP2K2 MET MLH1 MPL MYC NF1 NFE2L2 NOTCH1 NPM1 NRAS NTRK1 PDGFRA PIK3CAPTEN PTPN11 RAF1 RB1 RET RHEB RHOA RIT1 ROS1 SMAD4 SMO MAPK1 STK11 TERTTP53 TSC1 VHL MAPK3 MTOR NTRK3

TABLE 5 Start Stop Length Exons Gene Chromosome Position Position (bp)Covered Critical Feature ALK chr2 29446405 29446655 250 intron 19 FusionALK chr2 29446062 29446197 135 intron 20 Fusion ALK chr2 2944619829446404 206 20 Fusion ALK chr2 29447353 29447473 120 intron 19 FusionALK chr2 29447614 29448316 702 intron 19 Fusion ALK chr2 2944831729448441 124 19 Fusion ALK chr2 29449366 29449777 411 intron 18 FusionALK chr2 29449778 29449950 172 18 Fusion BRAF chr7 140453064 140453203139 15 BRAF V600 CTNNB1 chr3 41266007 41266254 247  3 S37 EGFR chr755240528 55240827 299 18 and 19 G719 and deletions EGFR chr7 5524160355241746 143 20 Insertions/T790M EGFR chr7 55242404 55242523 119 21L858R ERBB2 chr17 37880952 37881174 222 20 Insertions ESR1 chr6152419857 152420111 254 10 V534, P535, L536, Y537, D538 FGFR2 chr10123279482 123279693 211  6 S252 GATA3 chr10 8111426 8111571 145  5SS/Indels GATA3 chr10 8115692 8116002 310  6 SS/Indels GNAS chr2057484395 57484488  93  8 R844 IDH1 chr2 209113083 209113394 311  4 R132IDH2 chr15 90631809 90631989 180  4 R140, R172 KIT chr4 5552417155524258  87  1 KIT chr4 55561667 55561957 290  2 KIT chr4 5556443955564741 302  3 KIT chr4 55565785 55565942 157  4 KIT chr4 5556987955570068 189  5 KIT chr4 55573253 55573463 210  6 KIT chr4 5557557955575719 140  7 KIT chr4 55589739 55589874 135  8 KIT chr4 5559201255592226 214  9 KIT chr4 55593373 55593718 345 10 and 11 557, 559, 560,576 KIT chr4 55593978 55594297 319 12 and 13 V654 KIT chr4 5559549055595661 171 14 T670, S709 KIT chr4 55597483 55597595 112 15 D716 KITchr4 55598026 55598174 148 16 L783 KIT chr4 55599225 55599368 143 17C809, R815, D816, L818, D820, S821F, N822, Y823 KIT chr4 5560265355602785 132 18 A829P KIT chr4 55602876 55602996 120 19 KIT chr455603330 55603456 126 20 KIT chr4 55604584 55604733 149 21 KRAS chr1225378537 25378717 180  4 A146 KRAS chr12 25380157 25380356 199  3 Q61KRAS chr12 25398197 25398328 131  2 G12/G13 MET chr7 116411535 116412255720 13, 14, MET exon 14 SS intron 13, intron 14 NRAS chr1 115256410115256609 199  3 Q61 NRAS chr1 115258660 115258791 131  2 G12/G13 PIK3CAchr3 178935987 178936132 145 10 E545K PIK3CA chr3 178951871 178952162291 21 H1047R PTEN chr10 89692759 89693018 259  5 R130 SMAD4 chr1848604616 48604849 233 12 D537 TERT chr5 1294841 1295512 671 promoterchr5: 1295228 TP53 chr17 7573916 7574043 127 11 Q331, R337, R342 TP53chr17 7577008 7577165 157  8 R273 TP53 chr17 7577488 7577618 130  7 R248TP53 chr17 7578127 7578299 172  6 R213/Y220 TP53 chr17 7578360 7578564204  5 R175/Deletions TP53 chr17 7579301 7579600 299  4 12574 (totaltarget region) 16330 (total probe coverage)

Additionally or alternatively, suitable target region sets are availablefrom the literature. For example, Gale et al., PLoS One 13: e0194630(2018), which is incorporated herein by reference, describes a panel of35 cancer-related gene targets that can be used as part or all of asequence-variable target region set. These 35 targets are AKT1, ALK,BRAF, CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3,FOXL2, GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, KIT, KRAS, MED12,MET, MYC, NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53,and U2AF1.

In some embodiments, the sequence-variable target region set comprisestarget regions from at least 10, 20, 30, or 35 cancer-related genes,such as the cancer-related genes listed above.

In some embodiments, the sequence-variable target region set has afootprint of at least 50 kbp, e.g., at least 100 kbp, at least 200 kbp,at least 300 kbp, or at least 400 kbp. In some embodiments, thesequence-variable target region set has a footprint in the range of100-2000 kbp, e.g., 100-200 kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp,500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5Mbp or 1.5-2 Mbp. In some embodiments, the sequence-variable targetregion set has a footprint of at least 2 Mbp.

7. Subjects

In some embodiments, the DNA (e.g., cfDNA) is obtained from a subjecthaving a cancer. In some embodiments, the DNA (e.g., cfDNA) is obtainedfrom a subject suspected of having a cancer. In some embodiments, theDNA (e.g., cfDNA) is obtained from a subject having a tumor. In someembodiments, the DNA (e.g., cfDNA) is obtained from a subject suspectedof having a tumor. In some embodiments, the DNA (e.g., cfDNA) isobtained from a subject having neoplasia. In some embodiments, the DNA(e.g., cfDNA) is obtained from a subject suspected of having neoplasia.In some embodiments, the DNA (e.g., cfDNA) is obtained from a subject inremission from a tumor, cancer, or neoplasia (e.g., followingchemotherapy, surgical resection, radiation, or a combination thereof).In any of the foregoing embodiments, the cancer, tumor, or neoplasia orsuspected cancer, tumor, or neoplasia may be of the lung, colon, rectum,kidney, breast, prostate, or liver. In some embodiments, the cancer,tumor, or neoplasia or suspected cancer, tumor, or neoplasia is of thelung. In some embodiments, the cancer, tumor, or neoplasia or suspectedcancer, tumor, or neoplasia is of the colon or rectum. In someembodiments, the cancer, tumor, or neoplasia or suspected cancer, tumor,or neoplasia is of the breast. In some embodiments, the cancer, tumor,or neoplasia or suspected cancer, tumor, or neoplasia is of theprostate. In any of the foregoing embodiments, the subject may be ahuman subject.

8. Quantification

In some embodiments, epigenetic target regions captured from one or moreof the first subsample, the treated first subsample, or the treatedsecond subsample are quantified. For example, hypomethylation variabletarget regions may be quantified in the treated second subsample, and/orhypermethylation variable target regions may be quantified in the firstsubsample or treated first subsample. Quantification may be by anyappropriate technique, e.g., quantitative amplification such asquantitative PCR. In some embodiments, quantification is based onsequencing data (e.g., number of sequencing reads or number of uniquemolecules sequenced).

Quantification of epigenetic target regions as discussed above can beused for determining a presence, absence, or likelihood of cancer in asubject. For example, a determination of the presence or absence ofcancer can be based, at least in part, on whether the amount ofhypermethylation variable target regions in the first subsample ortreated first subsample and/or the amount of hypomethylation variabletarget regions in the treated second subsample exceeds a predeterminedthreshold. In some embodiments, such an amount can be used together withother data collected from the sample, e.g., the presence of mutationsand/or other epigenetic features described elsewhere herein such asperturbations of transcription start sites and/or CTCF binding sites.

9. Pooling of DNA from First and Second Subsamples or Portions Thereof.

In some embodiments, the methods comprise preparing a pool comprising atleast a portion of the DNA of the second subsample (also referred to asthe hypomethylated partition) and at least a portion of the DNA of thefirst subsample (also referred to as the hypermethylated partition).Target regions, e.g., including epigenetic target regions and/orsequence-variable target regions, may be captured from the pool. Thesteps of capturing a target region set from at least a portion of asubsample described elsewhere herein encompass capture steps performedon a pool comprising DNA from the first and second subsamples. A step ofamplifying DNA in the pool may be performed before capturing targetregions from the pool. The capturing step may have any of the featuresdescribed elsewhere herein.

The epigenetic target regions may show differences in methylation levelsand/or fragmentation patterns depending on whether they originated froma tumor or from healthy cells, or what type of tissue they originatedfrom, as discussed elsewhere herein. The sequence-variable targetregions may show differences in sequence depending on whether theyoriginated from a tumor or from healthy cells.

Analysis of epigenetic target regions from the hypomethylated partitionmay be less informative in some applications than analysis ofsequence-variable target-regions from the hypermethylated andhypomethylated partitions and epigenetic target regions from thehypermethylated partition. As such, in methods where sequence-variabletarget-regions and epigenetic target regions are being captured, thelatter may be captured to a lesser extent than one or more of thesequence-variable target-regions from the hypermethylated andhypomethylated partitions and epigenetic target regions from thehypermethylated partition. For example, sequence-variable target regionscan be captured from the portion of the hypomethylated partition notpooled with the hypermethylated partition, and the pool can be preparedwith some (e.g., a majority, substantially all, or all) of the DNA fromthe hypermethylated partition and none or some (e.g., a minority) of theDNA from the hypomethylated partition. Such approaches can reduce oreliminate sequencing of epigenetic target regions from thehypomethylated partition, thereby reducing the amount of sequencing datathat suffices for further analysis.

In some embodiments, including a minority of the DNA of thehypomethylated partition in the pool facilitates quantification of oneor more epigenetic features (e.g., methylation or other epigeneticfeature(s) discussed in detail elsewhere herein), e.g., on a relativebasis.

In some embodiments, the pool comprises a minority of the DNA of thehypomethylated partition, e.g., less than about 50% of the DNA of thehypomethylated partition, such as less than or equal to about 45%, 40%,35%, 30%, 25%, 20%, 15%, 10%, or 5% of the DNA of the hypomethylatedpartition. In some embodiments, the pool comprises about 5%-25% of theDNA of the hypomethylated partition. In some embodiments, the poolcomprises about 10%-20% of the DNA of the hypomethylated partition. Insome embodiments, the pool comprises about 10% of the DNA of thehypomethylated partition. In some embodiments, the pool comprises about15% of the DNA of the hypomethylated partition. In some embodiments, thepool comprises about 20% of the DNA of the hypomethylated partition.

In some embodiments, the pool comprises a portion of the hypermethylatedpartition, which may be at least about 50% of the DNA of thehypermethylated partition. For example, the pool may comprise at leastabout 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, or 95% of the DNA of thehypermethylated partition. In some embodiments, the pool comprises50-55%, 55-60%, 60-65%, 65-70%, 70-75%, 75-80%, 80-85%, 85-90%, 90-95%,or 95-100% of the DNA of the hypermethylated partition. In someembodiments, the second pool comprises all or substantially all of thehypermethylated partition.

In some embodiments, the methods comprise preparing a first poolcomprising at least a portion of the DNA of the hypomethylatedpartition. In some embodiments, the methods comprise preparing a secondpool comprising at least a portion of the DNA of the hypermethylatedpartition. In some embodiments, the first pool further comprises aportion of the DNA of the hypermethylated partition. In someembodiments, the second pool further comprises a portion of the DNA ofthe hypomethylated partition. In some embodiments, the first poolcomprises a majority of the DNA of the hypomethylated partition, andoptionally and a minority of the DNA of the hypermethylated partition.In some embodiments, the second pool comprises a majority of the DNA ofthe hypermethylated partition and a minority of the DNA of thehypomethylated partition. In some embodiments involving anintermediately methylated partition, the second pool comprises at leasta portion of the DNA of the intermediately methylated partition, e.g., amajority of the DNA of the intermediately methylated partition. In someembodiments, the first pool comprises a majority of the DNA of thehypomethylated partition, and the second pool comprises a majority ofthe DNA of the hypermethylated partition and a majority of the DNA ofthe intermediately methylated partition.

In some embodiments, the methods comprise capturing at least a first setof target regions from the first pool, e.g., wherein the first pool isas set forth in any of the embodiments above. In some embodiments, thefirst set comprises sequence-variable target regions. In someembodiments, the first set comprises hypomethylation variable targetregions and/or fragmentation variable target regions. In someembodiments, the first set comprises sequence-variable target regionsand fragmentation variable target regions. In some embodiments, thefirst set comprises sequence-variable target regions, hypomethylationvariable target regions and fragmentation variable target regions. Astep of amplifying DNA in the first pool may be performed before thiscapture step. In some embodiments, capturing the first set of targetregions from the first pool comprises contacting the DNA of the firstpool with a first set of target-specific probes. In some embodiments,the first set of target-specific probes comprises target-binding probesspecific for the sequence-variable target regions. In some embodiments,the first set of target-specific probes comprises target-binding probesspecific for the sequence-variable target regions, hypomethylationvariable target regions and/or fragmentation variable target regions.

In some embodiments, the methods comprise capturing a second set oftarget regions or plurality of sets of target regions from the secondpool, e.g., wherein the first pool is as set forth in any of theembodiments above. In some embodiments, the second plurality comprisesepigenetic target regions, such as hypermethylation variable targetregions and/or fragmentation variable target regions. In someembodiments, the second plurality comprises sequence-variable targetregions and epigenetic target regions, such as hypermethylation variabletarget regions and/or fragmentation variable target regions. A step ofamplifying DNA in the second pool may be performed before this capturestep. In some embodiments, capturing the second plurality of sets oftarget regions from the second pool comprises contacting the DNA of thefirst pool with a second set of target-specific probes, wherein thesecond set of target-specific probes comprises target-binding probesspecific for the sequence-variable target regions and target-bindingprobes specific for the epigenetic target regions. In some embodiments,the first set of target regions and the second set of target regions arenot identical. For example, the first set of target regions may compriseone or more target regions not present in the second set of targetregions. Alternatively or in addition, the second set of target regionsmay comprise one or more target regions not present in the first set oftarget regions. In some embodiments, at least one hypermethylationvariable target region is captured from the second pool but not from thefirst pool. In some embodiments, a plurality of hypermethylationvariable target regions are captured from the second pool but not fromthe first pool. In some embodiments, the first set of target regionscomprises sequence-variable target regions and/or the second set oftarget regions comprises epigenetic target regions. In some embodiments,the first set of target regions comprises sequence-variable targetregions, and fragmentation variable target regions; and the second setof target regions comprises epigenetic target regions, such ashypermethylation variable target regions and fragmentation variabletarget regions. In some embodiments, the first set of target regionscomprises sequence-variable target regions, fragmentation variabletarget regions, and comprises hypomethylation variable target regions;and the second set of target regions comprises epigenetic targetregions, such as hypermethylation variable target regions andfragmentation variable target regions.

In some embodiments, the first pool comprises a majority of the DNA ofthe hypomethylated partition and a portion of the DNA of thehypermethylated partition (e.g., about half), and the second poolcomprises a portion of the DNA of the hypermethylated partition (e.g.,about half). In some such embodiments, the first set of target regionscomprises sequence-variable target regions and/or the second set oftarget regions comprises epigenetic target regions. Thesequence-variable target regions and/or the epigenetic target regionsmay be as set forth in any of the embodiments described elsewhereherein.

10. Sequencing

In general, sample nucleic acids flanked by adapters with or withoutprior amplification can be subject to sequencing. Sequencing methodsinclude, for example, Sanger sequencing, high-throughput sequencing,pyrosequencing, sequencing-by-synthesis, single-molecule sequencing,nanopore sequencing, semiconductor sequencing, sequencing-by-ligation,sequencing-by-hybridization, Digital Gene Expression (Helicos), Nextgeneration sequencing (NGS), Single Molecule Sequencing by Synthesis(SMSS) (Helicos), massively-parallel sequencing, Clonal Single MoleculeArray (Solexa), shotgun sequencing, Ion Torrent, Oxford Nanopore, RocheGenia, Maxim-Gilbert sequencing, primer walking, and sequencing usingPacBio, SOLiD, Ion Torrent, or Nanopore platforms. Sequencing reactionscan be performed in a variety of sample processing units, which maymultiple lanes, multiple channels, multiple wells, or other mean ofprocessing multiple sample sets substantially simultaneously. Sampleprocessing unit can also include multiple sample chambers to enableprocessing of multiple runs simultaneously.

In some embodiments, a sequencing step is performed on a librarycomprising captured set of target regions, which may comprise any of thetarget region sets described herein. In some embodiments, a sequencingstep is performed on a library comprising a subsample that has notundergone capture/enrichment (e.g., a whole genome subsample). Forexample, target regions may be captured from the first subsample and thesecond sample and then sequenced; or target regions may be captured fromthe first subsample and combined with the second subsample afterprocessing such as contacting and tagging steps; or target regions maybe captured from the second subsample and combined with the firstsubsample after processing such as contacting and tagging steps; or boththe first and second subsamples may be processed and combined withoutundergoing capture/enrichment.

The sequencing reactions can be performed on one or more forms ofnucleic acids at least one of which is known to contain markers ofcancer or of other disease. The sequencing reactions can also beperformed on any nucleic acid fragments present in the sample. In someembodiments, sequence coverage of the genome may be less than 5%, 10%,15%, 20%, 25%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 99.9% or100%. In some embodiments, the sequence reactions may provide forsequence coverage of at least 5%, 10%, 15%, 20%, 25%, 30%, 40%, 50%,60%, 70%, or 80% of the genome. Sequence coverage can performed on atleast 5, 10, 20, 70, 100, 200 or 500 different genes, or at most 5000,2500, 1000, 500 or 100 different genes.

Simultaneous sequencing reactions may be performed using multiplexsequencing. In some cases, cell-free nucleic acids may be sequenced withat least 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000,50000, 100,000 sequencing reactions. In other cases cell-free nucleicacids may be sequenced with less than 1000, 2000, 3000, 4000, 5000,6000, 7000, 8000, 9000, 10000, 50000, 100,000 sequencing reactions.Sequencing reactions may be performed sequentially or simultaneously.Subsequent data analysis may be performed on all or part of thesequencing reactions. In some cases, data analysis may be performed onat least 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000,50000, 100,000 sequencing reactions. In other cases, data analysis maybe performed on less than 1000, 2000, 3000, 4000, 5000, 6000, 7000,8000, 9000, 10000, 50000, 100,000 sequencing reactions. An exemplaryread depth is 1000-50000 reads per locus (base).

a. Differential Depth of Sequencing

In some embodiments, nucleic acids corresponding to thesequence-variable target region set are sequenced to a greater depth ofsequencing than nucleic acids corresponding to the epigenetic targetregion set. For example, the depth of sequencing for nucleic acidscorresponding to the sequence variant target region set may be at least1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-, 4-, 4.5-, 5-, 6-,7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or 15-fold greater, or 1.25- to1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-, 2.25- to 2.5-, 2.5- to2.75-, 2.75- to 3-, 3- to 3.5-, 3.5- to 4-, 4- to 4.5-, 4.5- to 5-, 5-to 5.5-, 5.5- to 6-, 6- to 7-, 7- to 8-, 8- to 9-, 9- to 10-, 10- to11-, 11- to 12-, 13- to 14-, 14- to 15-fold, or 15- to 100-fold greater,than the depth of sequencing for nucleic acids corresponding to theepigenetic target region set. In some embodiments, said depth ofsequencing is at least 2-fold greater. In some embodiments, said depthof sequencing is at least 5-fold greater. In some embodiments, saiddepth of sequencing is at least 10-fold greater. In some embodiments,said depth of sequencing is 4- to 10-fold greater. In some embodiments,said depth of sequencing is 4- to 100-fold greater. Each of theseembodiments refer to the extent to which nucleic acids corresponding tothe sequence-variable target region set are sequenced to a greater depthof sequencing than nucleic acids corresponding to the epigenetic targetregion set.

In some embodiments, the captured cfDNA corresponding to thesequence-variable target region set and the captured cfDNA correspondingto the epigenetic target region set are sequenced concurrently, e.g., inthe same sequencing cell (such as the flow cell of an Illuminasequencer) and/or in the same composition, which may be a pooledcomposition resulting from recombining separately captured sets or acomposition obtained by capturing the cfDNA corresponding to thesequence-variable target region set and the captured cfDNA correspondingto the epigenetic target region set in the same vessel.

11. Analysis

In some embodiments, a method described herein comprises identifying thepresence of DNA produced by a tumor (or neoplastic cells, or cancercells).

The present methods can be used to diagnose presence of conditions,particularly cancer, in a subject, to characterize conditions (e.g.,staging cancer or determining heterogeneity of a cancer), monitorresponse to treatment of a condition, effect prognosis risk ofdeveloping a condition or subsequent course of a condition. The presentdisclosure can also be useful in determining the efficacy of aparticular treatment option. Successful treatment options may increasethe amount of copy number variation or rare mutations detected insubject's blood if the treatment is successful as more cancers may dieand shed DNA. In other examples, this may not occur. In another example,perhaps certain treatment options may be correlated with geneticprofiles of cancers over time. This correlation may be useful inselecting a therapy.

Additionally, if a cancer is observed to be in remission aftertreatment, the present methods can be used to monitor residual diseaseor recurrence of disease.

The types and number of cancers that may be detected may include bloodcancers, brain cancers, lung cancers, skin cancers, nose cancers, throatcancers, liver cancers, bone cancers, lymphomas, pancreatic cancers,skin cancers, bowel cancers, rectal cancers, thyroid cancers, bladdercancers, kidney cancers, mouth cancers, stomach cancers, solid statetumors, heterogeneous tumors, homogenous tumors and the like. Typeand/or stage of cancer can be detected from genetic variations includingmutations, rare mutations, indels, copy number variations,transversions, translocations, inversion, deletions, aneuploidy, partialaneuploidy, polyploidy, chromosomal instability, chromosomal structurealterations, gene fusions, chromosome fusions, gene truncations, geneamplification, gene duplications, chromosomal lesions, DNA lesions,abnormal changes in nucleic acid chemical modifications, abnormalchanges in epigenetic patterns, and abnormal changes in nucleic acid5-methylcytosine.

Genetic data can also be used for characterizing a specific form ofcancer. Cancers are often heterogeneous in both composition and staging.Genetic profile data may allow characterization of specific sub-types ofcancer that may be important in the diagnosis or treatment of thatspecific sub-type. This information may also provide a subject orpractitioner clues regarding the prognosis of a specific type of cancerand allow either a subject or practitioner to adapt treatment options inaccord with the progress of the disease. Some cancers can progress tobecome more aggressive and genetically unstable. Other cancers mayremain benign, inactive or dormant. The system and methods of thisdisclosure may be useful in determining disease progression.

Further, the methods of the disclosure may be used to characterize theheterogeneity of an abnormal condition in a subject. Such methods caninclude, e.g., generating a genetic profile of extracellularpolynucleotides derived from the subject, wherein the genetic profilecomprises a plurality of data resulting from copy number variation andrare mutation analyses. In some embodiments, an abnormal condition iscancer. In some embodiments, the abnormal condition may be one resultingin a heterogeneous genomic population. In the example of cancer, sometumors are known to comprise tumor cells in different stages of thecancer. In other examples, heterogeneity may comprise multiple foci ofdisease. Again, in the example of cancer, there may be multiple tumorfoci, perhaps where one or more foci are the result of metastases thathave spread from a primary site.

The present methods can be used to generate or profile, fingerprint orset of data that is a summation of genetic information derived fromdifferent cells in a heterogeneous disease. This set of data maycomprise copy number variation, epigenetic variation, and mutationanalyses alone or in combination.

The present methods can be used to diagnose, prognose, monitor orobserve cancers, or other diseases. In some embodiments, the methodsherein do not involve the diagnosing, prognosing or monitoring a fetusand as such are not directed to non-invasive prenatal testing. In otherembodiments, these methodologies may be employed in a pregnant subjectto diagnose, prognose, monitor or observe cancers or other diseases inan unborn subject whose DNA and other polynucleotides may co-circulatewith maternal molecules.

An exemplary method for molecular tag identification of MBD-beadpartitioned libraries through NGS which includes a step of subjectingthe first subsample to a procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA of the firstsubsample is as follows:

-   -   1. Physical partitioning of an extracted DNA sample (e.g.,        extracted blood plasma DNA from a human sample, which has        optionally been subjected to target capture as described herein)        using a methyl-binding domain protein-bead purification kit,        saving all elutions from process for downstream processing.    -   2. Parallel application of differential molecular tags and        NGS-enabling adapter sequences to each partition. For example,        the hypermethylated, residual methylation (‘wash’), and        hypomethylated partitions are ligated with NGS-adapters with        molecular tags.    -   3. Subject hypermethylated partition to a procedure that affects        a first nucleobase in the DNA differently from a second        nucleobase in the DNA, such as any of those described herein.    -   4. Re-combining all molecular tagged partitions, and subsequent        amplification using adapter-specific DNA primer sequences.    -   5. Capture/hybridization of re-combined and amplified total        library, targeting genomic regions of interest (e.g.,        cancer-specific genetic variants and differentially methylated        regions).    -   6. Re-amplification of the captured DNA library, appending a        sample tag. Different samples are pooled, and assayed in        multiplex on an NGS instrument.    -   7. Bioinformatics analysis of NGS data, with the molecular tags        being used to identify unique molecules, as well deconvolution        of the sample into molecules that were differentially        MBD-partitioned. This analysis can yield information on relative        5-methylcytosine for genomic regions, concurrent with standard        genetic sequencing/variant detection.

In some embodiments of methods described herein, including but notlimited to the method shown above, the molecular tags consist ofnucleotides that are not altered by the procedure that affects a firstnucleobase in the DNA differently from a second nucleobase in the DNA,such as any of those described herein (e.g., mC along with A, T, and Gwhere the procedure is bisulfite conversion or any other conversion thatdoes not affect mC; hmC along with A, T, and G where the procedure is aconversion that does not affect hmC; etc.). In some embodiments ofmethods described herein, including but not limited to the method shownabove, the molecular tags do not comprise nucleotides that are alteredby the procedure that affects a first nucleobase in the DNA differentlyfrom a second nucleobase in the DNA, such as any of those describedherein (e.g., the tags do not comprise unmodified C where the procedureis bisulfite conversion or any other conversion that affects C; the tagsdo not comprise mC where the procedure is a conversion that affects mC;the tags do not comprise hmC where the procedure is a conversion thataffects hmC; etc.).

In general, the procedure that affects a first nucleobase in the DNAdifferently from a second nucleobase in the DNA may instead be performedbefore the step of parallel application of differential molecular tagsand NGS-enabling adapter sequences to each partition. For example, thismay be done where the procedure that affects a first nucleobase in theDNA differently from a second nucleobase in the DNA is a separation,such as hmC-seal, and in such a case the separated populations maythemselves be differentially tagged relative to each other. Such anexemplary method is as follows:

-   -   1. Physical partitioning of an extracted DNA sample (e.g.,        extracted blood plasma DNA from a human sample, which has        optionally been subjected to target capture as described herein)        using a methyl-binding domain protein-bead purification kit,        saving all elutions from process for downstream processing.    -   2. Subject hypermethylated partition to a procedure that affects        a first nucleobase in the DNA differently from a second        nucleobase in the DNA, such as any of those described herein.    -   3. Parallel application of differential molecular tags and        NGS-enabling adapter sequences to each partition. For example,        the hypermethylated partition (or where applicable, two or more        sub-partitions of the hypermethylated partition), residual        methylation (‘wash’) partition, and hypomethylated partition are        ligated with NGS-adapters with molecular tags.    -   4. Re-combining all molecular tagged partitions, and subsequent        amplification using adapter-specific DNA primer sequences.    -   5. Capture/hybridization of re-combined and amplified total        library, targeting genomic regions of interest (e.g.,        cancer-specific genetic variants and differentially methylated        regions).    -   6. Re-amplification of the captured DNA library, appending a        sample tag. Different samples are pooled, and assayed in        multiplex on an NGS instrument.    -   7. Bioinformatics analysis of NGS data, with the molecular tags        being used to identify unique molecules, as well deconvolution        of the sample into molecules that were differentially        MBD-partitioned. This analysis can yield information on relative        5-methylcytosine for genomic regions, concurrent with standard        genetic sequencing/variant detection.

12. Exemplary Workflows

Exemplary workflows for partitioning and library preparation areprovided herein. In some embodiments, some or all features of thepartitioning and library preparation workflows may be used incombination.

a. Partitioning

In some embodiments, sample DNA (e.g., between 5 and 200 ng) is mixedwith methyl binding domain (MBD) buffer and magnetic beads conjugatedwith MBD proteins and incubated overnight. Methylated DNA(hypermethylated DNA) binds the MBD protein on the magnetic beads duringthis incubation. Non-methylated (hypomethylated DNA) or less methylatedDNA (intermediately methylated) is washed away from the beads withbuffers containing increasing concentrations of salt. For example, one,two, or more fractions containing non-methylated, hypomethylated, and/orintermediately methylated DNA may be obtained from such washes. Finally,a high salt buffer is used to elute the heavily methylated DNA(hypermethylated DNA) from the MBD protein. In some embodiments, thesewashes result in three partitions (hypomethylated partition,intermediately methylated fraction and hypermethylated partition) of DNAhaving increasing levels of methylation.

In some embodiments, the three partitions of DNA are desalted andconcentrated in preparation for the enzymatic steps of librarypreparation.

b. Library Preparation

In some embodiments (e.g., after concentrating the DNA in thepartitions), the partitioned DNA is made ligatable, e.g., by extendingthe end overhangs of the DNA molecules are extended, and addingadenosine residues to the 3′ ends of fragments and phosphorylating the5′ end of each DNA fragment. DNA ligase and adapters are added to ligateeach partitioned DNA molecule with an adapter on each end. Theseadapters contain partition tags (e.g., non-random, non-unique barcodes)that are distinguishable from the partition tags in the adapters used inthe other partitions. Either before or after making the portioned DNAligatable and performing the ligation, at least one subsample (e.g., thehypomethylated partition, or the hypomethylated partition and theintermediately methylated partition if applicable) is digested with amethylation dependent nuclease (e.g., a methylation-dependentrestriction enzyme, such as FspEI). Optionally, the hypermethylatedpartition may be digested with a methylation-sensitive nuclease, such asa methylation-senstive restriction enzyme (e.g., one or more, or each ofHpaII, BstUI and Hin6i). Optionally, the hypermethylated partition maybe subjected to a procedure that affects a first nucleobase in the DNAdifferently from a second nucleobase in the DNA, such as any of thosedescribed herein. Where the procedure that affects a first nucleobase inthe DNA differently from a second nucleobase in the DNA furtherpartitions the hypermethylated partition, the ligation of adaptersshould be performed after the procedure so that the sub-partitions ofthe hypermethylated partition can be differentially tagged. Then, thethree (or more) partitions are pooled together and are amplified (e.g.,by PCR, such as with primers specific for the adapters).

Following PCR, amplified DNA may be cleaned and concentrated prior toenrichment. The amplified DNA is contacted with a collection of probesdescribed herein (which may be, e.g., biotinylated RNA probes) thattarget specific regions of interest. The mixture is incubated, e.g.,overnight, e.g., in a salt buffer. The probes are captured (e.g., usingstreptavidin magnetic beads) and separated from the amplified DNA thatwas not captured, such as by a series of salt washes, thereby enrichingthe sample. After the enrichment, the enriched sample is amplified byPCR. In some embodiments, the PCR primers contain a sample tag, therebyincorporating the sample tag into the DNA molecules. In someembodiments, DNA from different samples is pooled together and thenmultiplex sequenced, e.g., using an Illumina NovaSeq sequencer.

C. Additional Features of Certain Disclosed Methods

1. Samples

A sample can be any biological sample isolated from a subject. A samplecan be a bodily sample. Samples can include body tissues, such as knownor suspected solid tumors, whole blood, platelets, serum, plasma, stool,red blood cells, white blood cells or leucocytes, endothelial cells,tissue biopsies, cerebrospinal fluid synovial fluid, lymphatic fluid,ascites fluid, interstitial or extracellular fluid, the fluid in spacesbetween cells, including gingival crevicular fluid, bone marrow, pleuraleffusions, cerebrospinal fluid, saliva, mucous, sputum, semen, sweat,urine. Samples are preferably body fluids, particularly blood andfractions thereof, and urine. A sample can be in the form originallyisolated from a subject or can have been subjected to further processingto remove or add components, such as cells, or enrich for one componentrelative to another. Thus, a preferred body fluid for analysis is plasmaor serum containing cell-free nucleic acids. A sample can be isolated orobtained from a subject and transported to a site of sample analysis.The sample may be preserved and shipped at a desirable temperature,e.g., room temperature, 4° C., −20° C., and/or −80° C. A sample can beisolated or obtained from a subject at the site of the sample analysis.The subject can be a human, a mammal, an animal, a companion animal, aservice animal, or a pet. The subject may have a cancer. The subject maynot have cancer or a detectable cancer symptom. The subject may havebeen treated with one or more cancer therapy, e.g., any one or more ofchemotherapies, antibodies, vaccines or biologies. The subject may be inremission. The subject may or may not be diagnosed of being susceptibleto cancer or any cancer-associated genetic mutations/disorders.

The volume of plasma can depend on the desired read depth for sequencedregions. Exemplary volumes are 0.4-40 ml, 5-20 ml, 10-20 ml. Forexamples, the volume can be 0.5 mL, 1 mL, 5 mL 10 mL, 20 mL, 30 mL, or40 mL. A volume of sampled plasma may be 5 to 20 mL.

A sample can comprise various amount of nucleic acid that containsgenome equivalents. For example, a sample of about 30 ng DNA can containabout 10,000 (10⁴) haploid human genome equivalents and, in the case ofcfDNA, about 200 billion (2×10¹¹) individual polynucleotide molecules.Similarly, a sample of about 100 ng of DNA can contain about 30,000haploid human genome equivalents and, in the case of cfDNA, about 600billion individual molecules.

A sample can comprise nucleic acids from different sources, e.g., fromcells and cell-free of the same subject, from cells and cell-free ofdifferent subjects. A sample can comprise nucleic acids carryingmutations. For example, a sample can comprise DNA carrying germlinemutations and/or somatic mutations. Germline mutations refer tomutations existing in germline DNA of a subject. Somatic mutations referto mutations originating in somatic cells of a subject, e.g., cancercells. A sample can comprise DNA carrying cancer-associated mutations(e.g., cancer-associated somatic mutations). A sample can comprise anepigenetic variant (i.e. a chemical or protein modification), whereinthe epigenetic variant associated with the presence of a genetic variantsuch as a cancer-associated mutation. In some embodiments, the samplecomprises an epigenetic variant associated with the presence of agenetic variant, wherein the sample does not comprise the geneticvariant.

Exemplary amounts of cell-free nucleic acids in a sample beforeamplification range from about 1 fg to about 1 μg, e.g., 1 pg to 200 ng,1 ng to 100 ng, 10 ng to 1000 ng. For example, the amount can be up toabout 600 ng, up to about 500 ng, up to about 400 ng, up to about 300ng, up to about 200 ng, up to about 100 ng, up to about 50 ng, or up toabout 20 ng of cell-free nucleic acid molecules. The amount can be atleast 1 fg, at least 10 fg, at least 100 fg, at least 1 pg, at least 10pg, at least 100 pg, at least 1 ng, at least 10 ng, at least 100 ng, atleast 150 ng, or at least 200 ng of cell-free nucleic acid molecules.The amount can be up to 1 femtogram (fg), 10 fg, 100 fg, 1 picogram(pg), 10 pg, 100 pg, 1 ng, 10 ng, 100 ng, 150 ng, or 200 ng of cell-freenucleic acid molecules. The method can comprise obtaining 1 femtogram(fg) to 200 ng-

Cell-free DNA refers to DNA not contained within a cell at the time ofits isolation from a subject. For example, cfDNA can be isolated from asample as the DNA remaining in the sample after removing intact cells,without lysing the cells or otherwise extracting intracellular DNA.Cell-free nucleic acids include DNA, RNA, and hybrids thereof, includinggenomic DNA, mitochondrial DNA, siRNA, miRNA, circulating RNA (cRNA),tRNA, rRNA, small nucleolar RNA (snoRNA), Piwi-interacting RNA (piRNA),long non-coding RNA (long ncRNA), or fragments of any of these.Cell-free nucleic acids can be double-stranded, single-stranded, or ahybrid thereof. A cell-free nucleic acid can be released into bodilyfluid through secretion or cell death processes, e.g., cellular necrosisand apoptosis. Some cell-free nucleic acids are released into bodilyfluid from cancer cells e.g., circulating tumor DNA, (ctDNA). Others arereleased from healthy cells. In some embodiments, cfDNA is cell-freefetal DNA (cffDNA) In some embodiments, cell free nucleic acids areproduced by tumor cells. In some embodiments, cell free nucleic acidsare produced by a mixture of tumor cells and non-tumor cells.

Cell-free nucleic acids have an exemplary size distribution of about100-500 nucleotides, with molecules of 110 to about 230 nucleotidesrepresenting about 90% of molecules, with a mode of about 168nucleotides and a second minor peak in a range between 240 to 440nucleotides.

Cell-free nucleic acids can be isolated from bodily fluids through afractionation or partitioning step in which cell-free nucleic acids, asfound in solution, are separated from intact cells and other non-solublecomponents of the bodily fluid. Partitioning may include techniques suchas centrifugation or filtration. Alternatively, cells in bodily fluidscan be lysed and cell-free and cellular nucleic acids processedtogether. Generally, after addition of buffers and wash steps, nucleicacids can be precipitated with an alcohol. Further clean up steps may beused such as silica based columns to remove contaminants or salts.Non-specific bulk carrier nucleic acids, such as C 1 DNA, DNA or proteinfor bisulfite sequencing, hybridization, and/or ligation, may be addedthroughout the reaction to optimize certain aspects of the proceduresuch as yield.

After such processing, samples can include various forms of nucleic acidincluding double stranded DNA, single stranded DNA and single strandedRNA. In some embodiments, single stranded DNA and RNA can be convertedto double stranded forms so they are included in subsequent processingand analysis steps.

Double-stranded DNA molecules in a sample and single stranded nucleicacid molecules converted to double stranded DNA molecules can be linkedto adapters at either one end or both ends. Typically, double strandedmolecules are blunt ended by treatment with a polymerase with a 5′-3′polymerase and a 3′-5′ exonuclease (or proof reading function), in thepresence of all four standard nucleotides. Klenow large fragment and T4polymerase are examples of suitable polymerase. The blunt ended DNAmolecules can be ligated with at least partially double stranded adapter(e.g., a Y shaped or bell-shaped adapter). Alternatively, complementarynucleotides can be added to blunt ends of sample nucleic acids andadapters to facilitate ligation. Contemplated herein are both blunt endligation and sticky end ligation. In blunt end ligation, both thenucleic acid molecules and the adapter tags have blunt ends. Insticky-end ligation, typically, the nucleic acid molecules bear an “A”overhang and the adapters bear a “T” overhang.

2. Amplification

Sample nucleic acids flanked by adapters can be amplified by PCR andother amplification methods. Amplification is typically primed byprimers binding to primer binding sites in adapters flanking a DNAmolecule to be amplified. Amplification methods can involve cycles ofdenaturation, annealing and extension, resulting from thermocycling orcan be isothermal as in transcription-mediated amplification. Otheramplification methods include the ligase chain reaction, stranddisplacement amplification, nucleic acid sequence based amplification,and self-sustained sequence based replication.

In some embodiments, the present methods perform dsDNA ligations withT-tailed and C-tailed adapters, which result in amplification of atleast 50, 60, 70 or 80% of double stranded nucleic acids before linkingto adapters. Preferably the present methods increase the amount ornumber of amplified molecules relative to control methods performed withT-tailed adapters alone by at least 10, 15 or 20%.

3. Bait Sets; Capture Moieties

As discussed above, nucleic acids in a sample can be subject to acapture step, in which molecules having target sequences are capturedfor subsequent analysis. Target capture can involve use of a bait setcomprising oligonucleotide baits labeled with a capture moiety, such asbiotin or the other examples noted below. The probes can have sequencesselected to tile across a panel of regions, such as genes. In someembodiments, a bait set can have higher and lower capture yields forsets of target regions such as those of the sequence-variable targetregion set and the epigenetic target region set, respectively, asdiscussed elsewhere herein. Such bait sets are combined with a sampleunder conditions that allow hybridization of the target molecules withthe baits. Then, captured molecules are isolated using the capturemoiety. For example, a biotin capture moiety by bead-based streptavidin.Such methods are further described in, for example, U.S. Pat. No.9,850,523, issuing Dec. 26, 2017, which is incorporated herein byreference.

Capture moieties include, without limitation, biotin, avidin,streptavidin, a nucleic acid comprising a particular nucleotidesequence, a hapten recognized by an antibody, and magneticallyattractable particles. The extraction moiety can be a member of abinding pair, such as biotin/streptavidin or hapten/antibody. In someembodiments, a capture moiety that is attached to an analyte is capturedby its binding pair which is attached to an isolatable moiety, such as amagnetically attractable particle or a large particle that can besedimented through centrifugation. The capture moiety can be any type ofmolecule that allows affinity separation of nucleic acids bearing thecapture moiety from nucleic acids lacking the capture moiety. Exemplarycapture moieties are biotin which allows affinity separation by bindingto streptavidin linked or linkable to a solid phase or anoligonucleotide, which allows affinity separation through binding to acomplementary oligonucleotide linked or linkable to a solid phase.

D. Collections of Target-Specific Probes

In some embodiments, a collection of target-specific probes is used inmethods described herein. In some embodiments, the collection oftarget-specific probes comprises target-binding probes specific for asequence-variable target region set and target-binding probes specificfor an epigenetic target region set. In some embodiments, the captureyield of the target-binding probes specific for the sequence-variabletarget region set is higher (e.g., at least 2-fold higher) than thecapture yield of the target-binding probes specific for the epigenetictarget region set. In some embodiments, the collection oftarget-specific probes is configured to have a capture yield specificfor the sequence-variable target region set higher (e.g., at least2-fold higher) than its capture yield specific for the epigenetic targetregion set.

In some embodiments, the capture yield of the target-binding probesspecific for the sequence-variable target region set is at least 1.25-,1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-, 4-, 4.5-, 5-, 6-, 7-, 8-,9-, 10-, 11-, 12-, 13-, 14-, or 15-fold higher than the capture yield ofthe target-binding probes specific for the epigenetic target region set.In some embodiments, the capture yield of the target-binding probesspecific for the sequence-variable target region set is 1.25- to 1.5-,1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-, 2.25- to 2.5-, 2.5- to 2.75-,2.75- to 3-, 3- to 3.5-, 3.5- to 4-, 4- to 4.5-, 4.5- to 5-, 5- to 5.5-,5.5- to 6-, 6- to 7-, 7- to 8-, 8- to 9-, 9- to 10-, 10- to 11-, 11- to12-, 13- to 14-, or 14- to 15-fold higher than the capture yield of thetarget-binding probes specific for the epigenetic target region set.

In some embodiments, the collection of target-specific probes isconfigured to have a capture yield specific for the sequence-variabletarget region set at least 1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-,3-, 3.5-, 4-, 4.5-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or15-fold higher than its capture yield for the epigenetic target regionset. In some embodiments, the collection of target-specific probes isconfigured to have a capture yield specific for the sequence-variabletarget region set is 1.25- to 1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to2.25-, 2.25- to 2.5-, 2.5- to 2.75-, 2.75- to 3-, 3- to 3.5-, 3.5- to4-, 4- to 4.5-, 4.5- to 5-, 5- to 5.5-, 5.5- to 6-, 6- to 7-, 7- to 8-,8- to 9-, 9- to 10-, 10- to 11-, 11- to 12-, 13- to 14-, or 14- to15-fold higher than its capture yield specific for the epigenetic targetregion set.

The collection of probes can be configured to provide higher captureyields for the sequence-variable target region set in various ways,including concentration, different lengths and/or chemistries (e.g.,that affect affinity), and combinations thereof. Affinity can bemodulated by adjusting probe length and/or including nucleotidemodifications as discussed below.

In some embodiments, the target-specific probes specific for thesequence-variable target region set are present at a higherconcentration than the target-specific probes specific for theepigenetic target region set. In some embodiments, concentration of thetarget-binding probes specific for the sequence-variable target regionset is at least 1.25-, 1.5-, 1.75-, 2-, 2.25-, 2.5-, 2.75-, 3-, 3.5-,4-, 4.5-, 5-, 6-, 7-, 8-, 9-, 10-, 11-, 12-, 13-, 14-, or 15-fold higherthan the concentration of the target-binding probes specific for theepigenetic target region set. In some embodiments, the concentration ofthe target-binding probes specific for the sequence-variable targetregion set is 1.25- to 1.5-, 1.5- to 1.75-, 1.75- to 2-, 2- to 2.25-,2.25- to 2.5-, 2.5- to 2.75-, 2.75- to 3-, 3- to 3.5-, 3.5- to 4-, 4- to4.5-, 4.5- to 5-, 5- to 5.5-, 5.5- to 6-, 6- to 7-, 7- to 8-, 8- to 9-,9- to 10-, 10- to 11-, 11- to 12-, 13- to 14-, or 14- to 15-fold higherthan the concentration of the target-binding probes specific for theepigenetic target region set. In such embodiments, concentration mayrefer to the average mass per volume concentration of individual probesin each set.

In some embodiments, the target-specific probes specific for thesequence-variable target region set have a higher affinity for theirtargets than the target-specific probes specific for the epigenetictarget region set. Affinity can be modulated in any way known to thoseskilled in the art, including by using different probe chemistries. Forexample, certain nucleotide modifications, such as cytosine5-methylation (in certain sequence contexts), modifications that providea heteroatom at the 2′ sugar position, and LNA nucleotides, can increasestability of double-stranded nucleic acids, indicating thatoligonucleotides with such modifications have relatively higher affinityfor their complementary sequences. See, e.g., Severin et al., NucleicAcids Res. 39: 8740-8751 (2011); Freier et al., Nucleic Acids Res. 25:4429-4443 (1997); U.S. Pat. No. 9,738,894. Also, longer sequence lengthswill generally provide increased affinity. Other nucleotidemodifications, such as the substitution of the nucleobase hypoxanthinefor guanine, reduce affinity by reducing the amount of hydrogen bondingbetween the oligonucleotide and its complementary sequence. In someembodiments, the target-specific probes specific for thesequence-variable target region set have modifications that increasetheir affinity for their targets. In some embodiments, alternatively oradditionally, the target-specific probes specific for the epigenetictarget region set have modifications that decrease their affinity fortheir targets. In some embodiments, the target-specific probes specificfor the sequence-variable target region set have longer average lengthsand/or higher average melting temperatures than the target-specificprobes specific for the epigenetic target region set. These embodimentsmay be combined with each other and/or with differences in concentrationas discussed above to achieve a desired fold difference in captureyield, such as any fold difference or range thereof described above.

In some embodiments, the target-specific probes comprise a capturemoiety. The capture moiety may be any of the capture moieties describedherein, e.g., biotin. In some embodiments, the target-specific probesare linked to a solid support, e.g., covalently or non-covalently suchas through the interaction of a binding pair of capture moieties. Insome embodiments, the solid support is a bead, such as a magnetic bead.

In some embodiments, the target-specific probes specific for thesequence-variable target region set and/or the target-specific probesspecific for the epigenetic target region set are a bait set asdiscussed above, e.g., probes comprising capture moieties and sequencesselected to tile across a panel of regions, such as genes.

In some embodiments, the target-specific probes are provided in a singlecomposition. The single composition may be a solution (liquid orfrozen). Alternatively, it may be a lyophilizate.

Alternatively, the target-specific probes may be provided as a pluralityof compositions, e.g., comprising a first composition comprising probesspecific for the epigenetic target region set and a second compositioncomprising probes specific for the sequence-variable target region set.These probes may be mixed in appropriate proportions to provide acombined probe composition with any of the foregoing fold differences inconcentration and/or capture yield. Alternatively, they may be used inseparate capture procedures (e.g., with aliquots of a sample orsequentially with the same sample) to provide first and secondcompositions comprising captured epigenetic target regions andsequence-variable target regions, respectively.

1. Probes Specific for Epigenetic Target Regions

The probes for the epigenetic target region set may comprise probesspecific for one or more types of target regions likely to differentiateDNA from neoplastic (e.g., tumor or cancer) cells from healthy cells,e.g., non-neoplastic circulating cells. Exemplary types of such regionsare discussed in detail herein, e.g., in the sections above concerningcaptured sets. The probes for the epigenetic target region set may alsocomprise probes for one or more control regions, e.g., as describedherein.

In some embodiments, the probes for the epigenetic target region sethave a footprint of at least 100 kbp, e.g., at least 200 kbp, at least300 kbp, or at least 400 kbp. In some embodiments, the epigenetic targetregion set has a footprint in the range of 100-20 Mbp, e.g., 100-200kbp, 200-300 kbp, 300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp,700-800 kbp, 800-900 kbp, 900-1,000 kbp, 1-1.5 Mbp, 1.5-2 Mbp, 2-3 Mbp,3-4 Mbp, 4-5 Mbp, 5-6 Mbp, 6-7 Mbp, 7-8 Mbp, 8-9 Mbp, 9-10 Mbp, or 10-20Mbp. In some embodiments, the epigenetic target region set has afootprint of at least 20 Mbp.

a. Hypermethylation Variable Target Regions

In some embodiments, the probes for the epigenetic target region setcomprise probes specific for one or more hypermethylation variabletarget regions. Hypermethylation variable target regions may also bereferred to herein as hypermethylated DMRs (differentially methylatedregions). The hypermethylation variable target regions may be any ofthose set forth above. For example, in some embodiments, the probesspecific for hypermethylation variable target regions comprise probesspecific for a plurality of loci listed in Table 1, e.g., at least 10%,20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or 100% of the loci listed inTable 1. In some embodiments, the probes specific for hypermethylationvariable target regions comprise probes specific for a plurality of locilisted in Table 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%,80%, 90%, or 100% of the loci listed in Table 2. In some embodiments,the probes specific for hypermethylation variable target regionscomprise probes specific for a plurality of loci listed in Table 1 orTable 2, e.g., at least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, or100% of the loci listed in Table 1 or Table 2. In some embodiments, foreach locus included as a target region, there may be one or more probeswith a hybridization site that binds between the transcription startsite and the stop codon (the last stop codon for genes that arealternatively spliced) of the gene. In some embodiments, the one or moreprobes bind within 300 bp of the listed position, e.g., within 200 or100 bp. In some embodiments, a probe has a hybridization siteoverlapping the position listed above. In some embodiments, the probesspecific for the hypermethylation target regions include probes specificfor one, two, three, four, or five subsets of hypermethylation targetregions that collectively show hypermethylation in one, two, three,four, or five of breast, colon, kidney, liver, and lung cancers.

b. Hypomethylation Variable Target Regions

In some embodiments, the probes for the epigenetic target region setcomprise probes specific for one or more hypomethylation variable targetregions. Hypomethylation variable target regions may also be referred toherein as hypomethylated DMRs (differentially methylated regions). Thehypomethylation variable target regions may be any of those set forthabove. For example, the probes specific for one or more hypomethylationvariable target regions may include probes for regions such as repeatedelements, e.g., LINE1 elements, Alu elements, centromeric tandemrepeats, pericentromeric tandem repeats, and satellite DNA, andintergenic regions that are ordinarily methylated in healthy cells mayshow reduced methylation in tumor cells.

In some embodiments, probes specific for hypomethylation variable targetregions include probes specific for repeated elements and/or intergenicregions. In some embodiments, probes specific for repeated elementsinclude probes specific for one, two, three, four, or five of LINE1elements, Alu elements, centromeric tandem repeats, pericentromerictandem repeats, and/or satellite DNA.

Exemplary probes specific for genomic regions that showcancer-associated hypomethylation include probes specific fornucleotides 8403565-8953708 and/or 151104701-151106035 of humanchromosome 1. In some embodiments, the probes specific forhypomethylation variable target regions include probes specific forregions overlapping or comprising nucleotides 8403565-8953708 and/or151104701-151106035 of human chromosome 1.

c. CTCF Binding Regions

In some embodiments, the probes for the epigenetic target region setinclude probes specific for CTCF binding regions. In some embodiments,the probes specific for CTCF binding regions comprise probes specificfor at least 10, 20, 50, 100, 200, or 500 CTCF binding regions, or10-20, 20-50, 50-100, 100-200, 200-500, or 500-1000 CTCF bindingregions, e.g., such as CTCF binding regions described above or in one ormore of CTCFBSDB or the Cuddapah et al., Martin et al., or Rhee et al.articles cited above. In some embodiments, the probes for the epigenetictarget region set comprise at least 100 bp, at least 200 bp at least 300bp, at least 400 bp, at least 500 bp, at least 750 bp, or at least 1000bp upstream and downstream regions of the CTCF binding sites.

d. Transcription Start Sites

In some embodiments, the probes for the epigenetic target region setinclude probes specific for transcriptional start sites. In someembodiments, the probes specific for transcriptional start sitescomprise probes specific for at least 10, 20, 50, 100, 200, or 500transcriptional start sites, or 10-20, 20-50, 50-100, 100-200, 200-500,or 500-1000 transcriptional start sites, e.g., such as transcriptionalstart sites listed in DBTSS. In some embodiments, the probes for theepigenetic target region set comprise probes for sequences at least 100bp, at least 200 bp, at least 300 bp, at least 400 bp, at least 500 bp,at least 750 bp, or at least 1000 bp upstream and downstream of thetranscriptional start sites.

e. Focal Amplifications

As noted above, although focal amplifications are somatic mutations,they can be detected by sequencing based on read frequency in a manneranalogous to approaches for detecting certain epigenetic changes such aschanges in methylation. As such, regions that may show focalamplifications in cancer can be included in the epigenetic target regionset, as discussed above. In some embodiments, the probes specific forthe epigenetic target region set include probes specific for focalamplifications. In some embodiments, the probes specific for focalamplifications include probes specific for one or more of AR, BRAF,CCND1, CCND2, CCNE1, CDK4, CDK6, EGFR, ERBB2, FGFR1, FGFR2, KIT, KRAS,MET, MYC, PDGFRA, PIK3CA, and RAF1. For example, in some embodiments,the probes specific for focal amplifications include probes specific forone or more of at least 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, or 18 of the foregoing targets.

f. Control Regions

It can be useful to include control regions to facilitate datavalidation. In some embodiments, the probes specific for the epigenetictarget region set include probes specific for control methylated regionsthat are expected to be methylated in essentially all samples. In someembodiments, the probes specific for the epigenetic target region setinclude probes specific for control hypomethylated regions that areexpected to be hypomethylated in essentially all samples.

2. Probes Specific for Sequence-Variable Target Regions

The probes for the sequence-variable target region set may compriseprobes specific for a plurality of regions known to undergo somaticmutations in cancer. The probes may be specific for anysequence-variable target region set described herein. Exemplarysequence-variable target region sets are discussed in detail herein,e.g., in the sections above concerning captured sets.

In some embodiments, the sequence-variable target region probe set has afootprint of at least 0.5 kb, e.g., at least 1 kb, at least 2 kb, atleast 5 kb, at least 10 kb, at least 20 kb, at least 30 kb, or at least40 kb. In some embodiments, the epigenetic target region probe set has afootprint in the range of 0.5-100 kb, e.g., 0.5-2 kb, 2-10 kb, 10-20 kb,20-30 kb, 30-40 kb, 40-50 kb, 50-60 kb, 60-70 kb, 70-80 kb, 80-90 kb,and 90-100 kb. In some embodiments, the sequence-variable target regionprobe set has a footprint of at least 50 kbp, e.g., at least 100 kbp, atleast 200 kbp, at least 300 kbp, or at least 400 kbp. In someembodiments, the sequence-variable target region probe set has afootprint in the range of 100-2000 kbp, e.g., 100-200 kbp, 200-300 kbp,300-400 kbp, 400-500 kbp, 500-600 kbp, 600-700 kbp, 700-800 kbp, 800-900kbp, 900-1,000 kbp, 1-1.5 Mbp or 1.5-2 Mbp. In some embodiments, thesequence-variable target region set has a footprint of at least 2 Mbp.

In some embodiments, probes specific for the sequence-variable targetregion set comprise probes specific for at least a portion of at least5, at least 10, at least 15, at least 20, at least 25, at least 30, atleast 35, at least 40, at least 45, at least 50, at least 55, at least60, at least 65, or at 70 of the genes of Table 3. In some embodiments,probes specific for the sequence-variable target region set compriseprobes specific for the at least 5, at least 10, at least 15, at least20, at least 25, at least 30, at least 35, at least 40, at least 45, atleast 50, at least 55, at least 60, at least 65, or 70 of the SNVs ofTable 3. In some embodiments, probes specific for the sequence-variabletarget region set comprise probes specific for at least 1, at least 2,at least 3, at least 4, at least 5, or 6 of the fusions of Table 3. Insome embodiments, probes specific for the sequence-variable targetregion set comprise probes specific for at least a portion of at least1, at least 2, or 3 of the indels of Table 3. In some embodiments,probes specific for the sequence-variable target region set compriseprobes specific for at least a portion of at least 5, at least 10, atleast 15, at least 20, at least 25, at least 30, at least 35, at least40, at least 45, at least 50, at least 55, at least 60, at least 65, atleast 70, or 73 of the genes of Table 4. In some embodiments, probesspecific for the sequence-variable target region set comprise probesspecific for at least 5, at least 10, at least 15, at least 20, at least25, at least 30, at least 35, at least 40, at least 45, at least 50, atleast 55, at least 60, at least 65, at least 70, or 73 of the SNVs ofTable 4. In some embodiments, probes specific for the sequence-variabletarget region set comprise probes specific for at least 1, at least 2,at least 3, at least 4, at least 5, or 6 of the fusions of Table 4. Insome embodiments, probes specific for the sequence-variable targetregion set comprise probes specific for at least a portion of at least1, at least 2, at least 3, at least 4, at least 5, at least 6, at least7, at least 8, at least 9, at least 10, at least 11, at least 12, atleast 13, at least 14, at least 15, at least 16, at least 17, or 18 ofthe indels of Table 4. In some embodiments, probes specific for thesequence-variable target region set comprise probes specific for atleast a portion of at least 1, at least 2, at least 3, at least 4, atleast 5, at least 6, at least 7, at least 8, at least 9, at least 10, atleast 11, at least 12, at least 13, at least 14, at least 15, at least16, at least 17, at least 18, at least 19, or at least 20 of the genesof Table 5.

In some embodiments, the probes specific for the sequence-variabletarget region set comprise probes specific for target regions from atleast 10, 20, 30, or 35 cancer-related genes, such as AKT1, ALK, BRAF,CCND1, CDK2A, CTNNB1, EGFR, ERBB2, ESR1, FGFR1, FGFR2, FGFR3, FOXL2,GATA3, GNA11, GNAQ, GNAS, HRAS, IDH1, IDH2, KIT, KRAS, MED12, MET, MYC,NFE2L2, NRAS, PDGFRA, PIK3CA, PPP2R1A, PTEN, RET, STK11, TP53, andU2AF1.

E. Compositions Comprising Captured DNA

Provided herein is a combination comprising first and second populationsof DNA, wherein the second population comprises fragments of DNA withends, or attached tags or adapters, at a recognition site of at leastone methylation-dependent nuclease, which may be any one or anycombination of the methylation-dependent nucleases described herein. Insome embodiments, the first and second populations are differentiallytagged. The first population may comprise or be derived from DNA with acytosine modification in a greater proportion than the secondpopulation. The first population may comprise a form of a firstnucleobase originally present in the DNA with altered base pairingspecificity and a second nucleobase without altered base pairingspecificity, wherein the form of the first nucleobase originally presentin the DNA prior to alteration of base pairing specificity is a modifiedor unmodified nucleobase, the second nucleobase is a modified orunmodified nucleobase different from the first nucleobase, and the formof the first nucleobase originally present in the DNA prior toalteration of base pairing specificity and the second nucleobase havethe same base pairing specificity. In some embodiments, the cytosinemodification is cytosine methylation. In some embodiments, the firstnucleobase is a modified or unmodified cytosine and the secondnucleobase is a modified or unmodified cytosine. The first and secondnucleobase may be any of those discussed herein in the Summary or withrespect to subjecting the first subsample to a procedure that affects afirst nucleobase in the DNA differently from a second nucleobase in theDNA of the first subsample. In some embodiments, the first populationcomprises fragments of DNA with ends, or attached tags or adapters, at arecognition site of at least one methylation-sensitive nuclease, whichmay be any one or any combination of the methylation-sensitive nucleasesdescribed herein.

In some embodiments, the first population comprises a sequence tagselected from a first set of one or more sequence tags and the secondpopulation comprises a sequence tag selected from a second set of one ormore sequence tags, and the second set of sequence tags is differentfrom the first set of sequence tags. The sequence tags may comprisebarcodes.

In some embodiments, the first population comprises protected hmC, suchas glucosylated hmC.

In some embodiments, the first population was subjected to any of theconversion procedures discussed herein, such as bisulfite conversion,Ox-BS conversion, TAB conversion, ACE conversion, TAP conversion, TAPSβconversion, or CAP conversion. In some embodiments, the first populationwas subjected to protection of hmC followed by deamination of mC and/orC.

In some embodiments of the combination, the first population comprisesor was derived from DNA with a cytosine modification in a greaterproportion than the second population and the first population comprisesfirst and second subpopulations, and the first nucleobase is a modifiedor unmodified nucleobase, the second nucleobase is a modified orunmodified nucleobase different from the first nucleobase, and the firstnucleobase and the second nucleobase have the same base pairingspecificity. In some embodiments, the second population does notcomprise the first nucleobase. In some embodiments, the first nucleobaseis a modified or unmodified cytosine, and the second nucleobase is amodified or unmodified cytosine, optionally wherein the modifiedcytosine is mC or hmC. In some embodiments, the first nucleobase is amodified or unmodified adenine, and the second nucleobase is a modifiedor unmodified adenine, optionally wherein the modified adenine is mA.

In some embodiments, the first nucleobase (e.g., a modified cytosine) isbiotinylated. In some embodiments, the first nucleobase (e.g., amodified cytosine) is a product of a Huisgen cycloaddition toβ-6-azide-glucosyl-5-hydroxymethylcytosine that comprises an affinitylabel (e.g., biotin).

In any of the combinations described herein, the captured DNA maycomprise cfDNA.

The captured DNA may have any of the features described hereinconcerning captured sets, including, e.g., a greater concentration ofthe DNA corresponding to the sequence-variable target region set(normalized for footprint size as discussed above) than of the DNAcorresponding to the epigenetic target region set. In some embodiments,the DNA of the captured set comprises sequence tags, which may be addedto the DNA as described herein. In general, the inclusion of sequencetags results in the DNA molecules differing from their naturallyoccurring, untagged form.

The combination may further comprise a probe set described herein orsequencing primers, each of which may differ from naturally occurringnucleic acid molecules. For example, a probe set described herein maycomprise a capture moiety, and sequencing primers may comprise anon-naturally occurring label.

F. Computer Systems

Methods of the present disclosure can be implemented using, or with theaid of, computer systems. For example, such methods may comprise:partitioning the sample into a plurality of subsamples, including afirst subsample and a second subsample, wherein the first subsamplecomprises DNA with a cytosine modification in a greater proportion thanthe second subsample; subjecting the first subsample to a procedure thataffects a first nucleobase in the DNA differently from a secondnucleobase in the DNA of the first subsample, wherein the firstnucleobase is a modified or unmodified nucleobase, the second nucleobaseis a modified or unmodified nucleobase different from the firstnucleobase, and the first nucleobase and the second nucleobase have thesame base pairing specificity; and sequencing DNA in the first subsampleand DNA in the second subsample in a manner that distinguishes the firstnucleobase from the second nucleobase in the DNA of the first subsample.

FIG. 5 shows a computer system 501 that is programmed or otherwiseconfigured to implement the methods of the present disclosure. Thecomputer system 501 can regulate various aspects sample preparation,sequencing, and/or analysis. In some examples, the computer system 501is configured to perform sample preparation and sample analysis,including nucleic acid sequencing.

The computer system 501 includes a central processing unit (CPU, also“processor” and “computer processor” herein) 505, which can be a singlecore or multi core processor, or a plurality of processors for parallelprocessing. The computer system 501 also includes memory or memorylocation 510 (e.g., random-access memory, read-only memory, flashmemory), electronic storage unit 515 (e.g., hard disk), communicationinterface 520 (e.g., network adapter) for communicating with one or moreother systems, and peripheral devices 525, such as cache, other memory,data storage, and/or electronic display adapters. The memory 510,storage unit 515, interface 520, and peripheral devices 525 are incommunication with the CPU 505 through a communication network or bus(solid lines), such as a motherboard. The storage unit 515 can be a datastorage unit (or data repository) for storing data. The computer system501 can be operatively coupled to a computer network 530 with the aid ofthe communication interface 520. The computer network 530 can be theInternet, an internet and/or extranet, or an intranet and/or extranetthat is in communication with the Internet. The computer network 530 insome cases is a telecommunication and/or data network. The computernetwork 530 can include one or more computer servers, which can enabledistributed computing, such as cloud computing. The computer network530, in some cases with the aid of the computer system 501, canimplement a peer-to-peer network, which may enable devices coupled tothe computer system 501 to behave as a client or a server.

The CPU 505 can execute a sequence of machine-readable instructions,which can be embodied in a program or software. The instructions may bestored in a memory location, such as the memory 510. Examples ofoperations performed by the CPU 405 can include fetch, decode, execute,and writeback.

The storage unit 515 can store files, such as drivers, libraries, andsaved programs. The storage unit 515 can store programs generated byusers and recorded sessions, as well as output(s) associated with theprograms. The storage unit 515 can store user data, e.g., userpreferences and user programs. The computer system 501 in some cases caninclude one or more additional data storage units that are external tothe computer system 501, such as located on a remote server that is incommunication with the computer system 501 through an intranet or theInternet. Data may be transferred from one location to another using,for example, a communication network or physical data transfer (e.g.,using a hard drive, thumb drive, or other data storage mechanism).

The computer system 501 can communicate with one or more remote computersystems through the network 530. For embodiment, the computer system 501can communicate with a remote computer system of a user (e.g.,operator). Examples of remote computer systems include personalcomputers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad,Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone,Android-enabled device, Blackberry®), or personal digital assistants.The user can access the computer system 501 via the network 530.

Methods as described herein can be implemented by way of machine (e.g.,computer processor) executable code stored on an electronic storagelocation of the computer system 501, such as, for example, on the memory510 or electronic storage unit 515. The machine executable ormachine-readable code can be provided in the form of software. Duringuse, the code can be executed by the processor 505. In some cases, thecode can be retrieved from the storage unit 515 and stored on the memory510 for ready access by the processor 505. In some situations, theelectronic storage unit 515 can be precluded, and machine-executableinstructions are stored on memory 510.

In an aspect, the present disclosure provides a non-transitorycomputer-readable medium comprising computer-executable instructionswhich, when executed by at least one electronic processor, perform atleast a portion of a method comprising: partitioning a sample comprisingDNA into a plurality of subsamples, including a first subsample and asecond subsample, wherein the first subsample comprises DNA with acytosine modification in a greater proportion than the second subsample;contacting the second subsample with a methylation-dependent nuclease,thereby degrading nonspecifically partitioned DNA in the secondsubsample to produce a treated second subsample and optionallycontacting the first subsample with a methylation-sensitiveendonuclease, thereby degrading nonspecifically partitioned DNA in thefirst subsample to produce a treated first subsample; capturing a firsttarget region set comprising epigenetic target regions from the firstsubsample and the treated first subsample, and capturing a second targetregion set comprising epigenetic target regions from the treated secondsubsample; and sequencing DNA in the first target region set and thesecond target region set. In an aspect, the present disclosure providesa non-transitory computer-readable medium comprising computer-executableinstructions which, when executed by at least one electronic processor,perform at least a portion of a method comprising: partitioning thesample into a plurality of subsamples, including a first subsample and asecond subsample, wherein the first subsample comprises DNA with acytosine modification in a greater proportion than the second subsample;contacting the second subsample with a methylation-dependent nuclease,thereby degrading nonspecifically partitioned DNA in the secondsubsample to produce a treated second subsample and optionallycontacting the first subsample with a methylation-sensitiveendonuclease, thereby degrading nonspecifically partitioned DNA in thefirst subsample to produce a treated first subsample; capturing a firsttarget region set comprising epigenetic target regions from the firstsubsample and the treated first subsample; and sequencing DNA in thefirst target region set and DNA from the second subsample. In someembodiments, the method further comprises obtaining a plurality ofsequence reads generated by a nucleic acid sequencer from thesequencing; mapping the plurality of sequence reads to one or morereference sequences to generate mapped sequence reads; and processingthe mapped sequence reads to determine the likelihood that the subjecthas cancer.

The code can be pre-compiled and configured for use with a machine havea processer adapted to execute the code or can be compiled duringruntime. The code can be supplied in a programming language that can beselected to enable the code to execute in a pre-compiled or as-compiledfashion.

Aspects of the systems and methods provided herein, such as the computersystem 501, can be embodied in programming. Various aspects of thetechnology may be thought of as “products” or “articles of manufacture”typically in the form of machine (or processor) executable code and/orassociated data that is carried on or embodied in a type of machinereadable medium. Machine-executable code can be stored on an electronicstorage unit, such memory (e.g., read-only memory, random-access memory,flash memory) or a hard disk. “Storage” type media can include any orall of the tangible memory of the computers, processors or the like, orassociated modules thereof, such as various semiconductor memories, tapedrives, disk drives and the like, which may provide non-transitorystorage at any time for the software programming.

All or portions of the software may at times be communicated through theInternet or various other telecommunication networks. Suchcommunications, for example, may enable loading of the software from onecomputer or processor into another, for example, from a managementserver or host computer into the computer platform of an applicationserver. Thus, another type of media that may bear the software elementsincludes optical, electrical, and electromagnetic waves, such as thoseused across physical interfaces between local devices, through wired andoptical landline networks, and over various air-links. The physicalelements that carry such waves, such as wired or wireless links, opticallinks, or the like, also may be considered as media bearing thesoftware. As used herein, unless restricted to non-transitory, tangible“storage” media, terms such as computer or machine “readable medium”refer to any medium that participates in providing instructions to aprocessor for execution.

Hence, a machine-readable medium, such as computer-executable code, maytake many forms, including but not limited to, a tangible storagemedium, a carrier wave medium or physical transmission medium.Non-volatile storage media include, for example, optical or magneticdisks, such as any of the storage devices in any computer(s) or thelike, such as may be used to implement the databases, etc. shown in thedrawings. Volatile storage media include dynamic memory, such as mainmemory of such a computer platform. Tangible transmission media includecoaxial cables; copper wire and fiber optics, including the wires thatcomprise a bus within a computer system. Carrier-wave transmission mediamay take the form of electric or electromagnetic signals, or acoustic orlight waves such as those generated during radio frequency (RF) andinfrared (IR) data communications. Common forms of computer-readablemedia therefore include for example: a floppy disk, a flexible disk,hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD orDVD-ROM, any other optical medium, punch cards, paper tape, any otherphysical storage medium with patterns of holes, a RAM, a ROM, a PROM andEPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wavetransporting data or instructions, cables or links transporting such acarrier wave, or any other medium from which a computer may readprogramming code and/or data. Many of these forms of computer readablemedia may be involved in carrying one or more sequences of one or moreinstructions to a processor for execution.

The computer system 501 can include or be in communication with anelectronic display 535 that comprises a user interface (UI) 540 forproviding, for example, one or more results of sample analysis. Examplesof UIs include, without limitation, a graphical user interface (GUI) andweb-based user interface.

Additional details relating to computer systems and networks, databases,and computer program products are also provided in, for example,Peterson, Computer Networks: A Systems Approach, Morgan Kaufmann, 5thEd. (2011), Kurose, Computer Networking: A Top-Down Approach, Pearson,7^(th) Ed. (2016), Elmasri, Fundamentals of Database Systems, AddisonWesley, 6th Ed. (2010), Coronel, Database Systems: Design,Implementation, & Management, Cengage Learning, 11^(th) Ed. (2014),Tucker, Programming Languages, McGraw-Hill Science/Engineering/Math, 2ndEd. (2006), and Rhoton, Cloud Computing Architected: Solution DesignHandbook, Recursive Press (2011), each of which is hereby incorporatedby reference in its entirety.

G. Applications

1. Cancer and Other Diseases

The present methods can be used to diagnose presence of conditions,particularly cancer, in a subject, to characterize conditions (e.g.,staging cancer or determining heterogeneity of a cancer), monitorresponse to treatment of a condition, effect prognosis risk ofdeveloping a condition or subsequent course of a condition. The presentdisclosure can also be useful in determining the efficacy of aparticular treatment option. Successful treatment options may increasethe amount of copy number variation or rare mutations detected insubject's blood if the treatment is successful as more cancers may dieand shed DNA. In other examples, this may not occur. In another example,perhaps certain treatment options may be correlated with geneticprofiles of cancers over time. This correlation may be useful inselecting a therapy. In some embodiments, hypermethylation variableepigenetic target regions are analyzed to determine whether they showhypermethylation characteristic of tumor cells or cells that do notordinarily contribute significantly to cfDNA and/or hypomethylationvariable epigenetic target regions are analyzed to determine whetherthey show hypomethylation characteristic of tumor cells or cells that donot ordinarily contribute significantly to cfDNA.

Additionally, if a cancer is observed to be in remission aftertreatment, the present methods can be used to monitor residual diseaseor recurrence of disease.

In some embodiments, the methods and systems disclosed herein may beused to identify customized or targeted therapies to treat a givendisease or condition in patients based on the classification of anucleic acid variant as being of somatic or germline origin. Typically,the disease under consideration is a type of cancer. Non-limitingexamples of such cancers include biliary tract cancer, bladder cancer,transitional cell carcinoma, urothelial carcinoma, brain cancer,gliomas, astrocytomas, breast carcinoma, metaplastic carcinoma, cervicalcancer, cervical squamous cell carcinoma, rectal cancer, colorectalcarcinoma, colon cancer, hereditary nonpolyposis colorectal cancer,colorectal adenocarcinomas, gastrointestinal stromal tumors (GISTs),endometrial carcinoma, endometrial stromal sarcomas, esophageal cancer,esophageal squamous cell carcinoma, esophageal adenocarcinoma, ocularmelanoma, uveal melanoma, gallbladder carcinomas, gallbladderadenocarcinoma, renal cell carcinoma, clear cell renal cell carcinoma,transitional cell carcinoma, urothelial carcinomas, Wilms tumor,leukemia, acute lymphocytic leukemia (ALL), acute myeloid leukemia(AML), chronic lymphocytic leukemia (CLL), chronic myeloid leukemia(CIVIL), chronic myelomonocytic leukemia (CMML), liver cancer, livercarcinoma, hepatoma, hepatocellular carcinoma, cholangiocarcinoma,hepatoblastoma, Lung cancer, non-small cell lung cancer (NSCLC),mesothelioma, B-cell lymphomas, non-Hodgkin lymphoma, diffuse largeB-cell lymphoma, Mantle cell lymphoma, T cell lymphomas, non-Hodgkinlymphoma, precursor T-lymphoblastic lymphoma/leukemia, peripheral T celllymphomas, multiple myeloma, nasopharyngeal carcinoma (NPC),neuroblastoma, oropharyngeal cancer, oral cavity squamous cellcarcinomas, osteosarcoma, ovarian carcinoma, pancreatic cancer,pancreatic ductal adenocarcinoma, pseudopapillary neoplasms, acinar cellcarcinomas. Prostate cancer, prostate adenocarcinoma, skin cancer,melanoma, malignant melanoma, cutaneous melanoma, small intestinecarcinomas, stomach cancer, gastric carcinoma, gastrointestinal stromaltumor (GIST), uterine cancer, or uterine sarcoma. Type and/or stage ofcancer can be detected from genetic variations including mutations, raremutations, indels, copy number variations, transversions,translocations, inversion, deletions, aneuploidy, partial aneuploidy,polyploidy, chromosomal instability, chromosomal structure alterations,gene fusions, chromosome fusions, gene truncations, gene amplification,gene duplications, chromosomal lesions, DNA lesions, abnormal changes innucleic acid chemical modifications, abnormal changes in epigeneticpatterns, and abnormal changes in nucleic acid 5-methylcytosine.

Genetic data can also be used for characterizing a specific form ofcancer. Cancers are often heterogeneous in both composition and staging.Genetic profile data may allow characterization of specific sub-types ofcancer that may be important in the diagnosis or treatment of thatspecific sub-type. This information may also provide a subject orpractitioner clues regarding the prognosis of a specific type of cancerand allow either a subject or practitioner to adapt treatment options inaccord with the progress of the disease. Some cancers can progress tobecome more aggressive and genetically unstable. Other cancers mayremain benign, inactive or dormant. The system and methods of thisdisclosure may be useful in determining disease progression.

Further, the methods of the disclosure may be used to characterize theheterogeneity of an abnormal condition in a subject. Such methods caninclude, e.g., generating a genetic profile of extracellularpolynucleotides derived from the subject, wherein the genetic profilecomprises a plurality of data resulting from copy number variation andrare mutation analyses. In some embodiments, an abnormal condition iscancer. In some embodiments, the abnormal condition may be one resultingin a heterogeneous genomic population. In the example of cancer, sometumors are known to comprise tumor cells in different stages of thecancer. In other examples, heterogeneity may comprise multiple foci ofdisease. Again, in the example of cancer, there may be multiple tumorfoci, perhaps where one or more foci are the result of metastases thathave spread from a primary site.

The present methods can be used to generate or profile, fingerprint orset of data that is a summation of genetic information derived fromdifferent cells in a heterogeneous disease. This set of data maycomprise copy number variation, epigenetic variation, and mutationanalyses alone or in combination.

The present methods can be used to diagnose, prognose, monitor orobserve cancers, or other diseases. In some embodiments, the methodsherein do not involve the diagnosing, prognosing or monitoring a fetusand as such are not directed to non-invasive prenatal testing. In otherembodiments, these methodologies may be employed in a pregnant subjectto diagnose, prognose, monitor or observe cancers or other diseases inan unborn subject whose DNA and other polynucleotides may co-circulatewith maternal molecules.

Non-limiting examples of other genetic-based diseases, disorders, orconditions that are optionally evaluated using the methods and systemsdisclosed herein include achondroplasia, alpha-1 antitrypsin deficiency,antiphospholipid syndrome, autism, autosomal dominant polycystic kidneydisease, Charcot-Marie-Tooth (CMT), cri du chat, Crohn's disease, cysticfibrosis, Dercum disease, down syndrome, Duane syndrome, Duchennemuscular dystrophy, Factor V Leiden thrombophilia, familialhypercholesterolemia, familial Mediterranean fever, fragile X syndrome,Gaucher disease, hemochromatosis, hemophilia, holoprosencephaly,Huntington's disease, Klinefelter syndrome, Marfan syndrome, myotonicdystrophy, neurofibromatosis, Noonan syndrome, osteogenesis imperfecta,Parkinson's disease, phenylketonuria, Poland anomaly, porphyria,progeria, retinitis pigmentosa, severe combined immunodeficiency (SCID),sickle cell disease, spinal muscular atrophy, Tay-Sachs, thalassemia,trimethylaminuria, Turner syndrome, velocardiofacial syndrome, WAGRsyndrome, Wilson disease, or the like.

In some embodiments, a method described herein comprises detecting apresence or absence of DNA originating or derived from a tumor cell at apreselected timepoint following a previous cancer treatment of a subjectpreviously diagnosed with cancer using a set of sequence informationobtained as described herein. The method may further comprisedetermining a cancer recurrence score that is indicative of the presenceor absence of the DNA originating or derived from the tumor cell for thetest subject.

Where a cancer recurrence score is determined, it may further be used todetermine a cancer recurrence status. The cancer recurrence status maybe at risk for cancer recurrence, e.g., when the cancer recurrence scoreis above a predetermined threshold. The cancer recurrence status may beat low or lower risk for cancer recurrence, e.g., when the cancerrecurrence score is above a predetermined threshold. In particularembodiments, a cancer recurrence score equal to the predeterminedthreshold may result in a cancer recurrence status of either at risk forcancer recurrence or at low or lower risk for cancer recurrence.

In some embodiments, a cancer recurrence score is compared with apredetermined cancer recurrence threshold, and the test subject isclassified as a candidate for a subsequent cancer treatment when thecancer recurrence score is above the cancer recurrence threshold or nota candidate for therapy when the cancer recurrence score is below thecancer recurrence threshold. In particular embodiments, a cancerrecurrence score equal to the cancer recurrence threshold may result inclassification as either a candidate for a subsequent cancer treatmentor not a candidate for therapy.

The methods discussed above may further comprise any compatible featureor features set forth elsewhere herein, including in the sectionregarding methods of determining a risk of cancer recurrence in a testsubject and/or classifying a test subject as being a candidate for asubsequent cancer treatment.

2. Methods of Determining a Risk of Cancer Recurrence in a Test Subjectand/or Classifying a Test Subject as being a Candidate for a SubsequentCancer Treatment

In some embodiments, a method provided herein is a method of determininga risk of cancer recurrence in a test subject. In some embodiments, amethod provided herein is a method of classifying a test subject asbeing a candidate for a subsequent cancer treatment.

Any of such methods may comprise collecting DNA (e.g., originating orderived from a tumor cell) from the test subject diagnosed with thecancer at one or more preselected timepoints following one or moreprevious cancer treatments to the test subject. The subject may be anyof the subjects described herein. The DNA may be cfDNA. The DNA may beobtained from a tissue sample.

Any of such methods may comprise capturing a plurality of sets of targetregions from DNA from the subject, wherein the plurality of targetregion sets comprises a sequence-variable target region set and anepigenetic target region set, whereby a captured set of DNA molecules isproduced. The capturing step may be performed according to any of theembodiments described elsewhere herein.

In any of such methods, the previous cancer treatment may comprisesurgery, administration of a therapeutic composition, and/orchemotherapy.

Any of such methods may comprise sequencing the captured DNA molecules,whereby a set of sequence information is produced. The captured DNAmolecules of the sequence-variable target region set may be sequenced toa greater depth of sequencing than the captured DNA molecules of theepigenetic target region set.

Any of such methods may comprise detecting a presence or absence of DNAoriginating or derived from a tumor cell at a preselected timepointusing the set of sequence information. The detection of the presence orabsence of DNA originating or derived from a tumor cell may be performedaccording to any of the embodiments thereof described elsewhere herein.

Methods of determining a risk of cancer recurrence in a test subject maycomprise determining a cancer recurrence score that is indicative of thepresence or absence, or amount, of the DNA originating or derived fromthe tumor cell for the test subject. The cancer recurrence score mayfurther be used to determine a cancer recurrence status. The cancerrecurrence status may be at risk for cancer recurrence, e.g., when thecancer recurrence score is above a predetermined threshold. The cancerrecurrence status may be at low or lower risk for cancer recurrence,e.g., when the cancer recurrence score is above a predeterminedthreshold. In particular embodiments, a cancer recurrence score equal tothe predetermined threshold may result in a cancer recurrence status ofeither at risk for cancer recurrence or at low or lower risk for cancerrecurrence.

Methods of classifying a test subject as being a candidate for asubsequent cancer treatment may comprise comparing the cancer recurrencescore of the test subject with a predetermined cancer recurrencethreshold, thereby classifying the test subject as a candidate for thesubsequent cancer treatment when the cancer recurrence score is abovethe cancer recurrence threshold or not a candidate for therapy when thecancer recurrence score is below the cancer recurrence threshold. Inparticular embodiments, a cancer recurrence score equal to the cancerrecurrence threshold may result in classification as either a candidatefor a subsequent cancer treatment or not a candidate for therapy. Insome embodiments, the subsequent cancer treatment comprises chemotherapyor administration of a therapeutic composition.

Any of such methods may comprise determining a disease-free survival(DFS) period for the test subject based on the cancer recurrence score;for example, the DFS period may be 1 year, 2 years, 3, years, 4 years, 5years, or 10 years.

In some embodiments, the set of sequence information comprisessequence-variable target region sequences, and determining the cancerrecurrence score may comprise determining at least a first subscoreindicative of the amount of SNVs, insertions/deletions, CNVs and/orfusions present in sequence-variable target region sequences.

In some embodiments, a number of mutations in the sequence-variabletarget regions chosen from 1, 2, 3, 4, or 5 is sufficient for the firstsubscore to result in a cancer recurrence score classified as positivefor cancer recurrence. In some embodiments, the number of mutations ischosen from 1, 2, or 3.

In some embodiments, the set of sequence information comprisesepigenetic target region sequences, and determining the cancerrecurrence score comprises determining a second subscore indicative ofthe amount of molecules (obtained from the epigenetic target regionsequences) that represent an epigenetic state different from DNA foundin a corresponding sample from a healthy subject (e.g., cfDNA found in ablood sample from a healthy subject, or DNA found in a tissue samplefrom a healthy subject where the tissue sample is of the same type oftissue as was obtained from the test subject). These abnormal molecules(i.e., molecules with an epigenetic state different from DNA found in acorresponding sample from a healthy subject) may be consistent withepigenetic changes associated with cancer, e.g., methylation ofhypermethylation variable target regions and/or perturbed fragmentationof fragmentation variable target regions, where “perturbed” meansdifferent from DNA found in a corresponding sample from a healthysubject.

In some embodiments, a proportion of molecules corresponding to thehypermethylation variable target region set and/or fragmentationvariable target region set that indicate hypermethylation in thehypermethylation variable target region set and/or abnormalfragmentation in the fragmentation variable target region set greaterthan or equal to a value in the range of 0.001%-10% is sufficient forthe second subscore to be classified as positive for cancer recurrence.The range may be 0.001%-1%, 0.005%-1%, 0.01%-5%, 0.01%-2%, or 0.01%-1%.

In some embodiments, any of such methods may comprise determining afraction of tumor DNA from the fraction of molecules in the set ofsequence information that indicate one or more features indicative oforigination from a tumor cell. This may be done for moleculescorresponding to some or all of the epigenetic target regions, e.g.,including one or both of hypermethylation variable target regions andfragmentation variable target regions (hypermethylation of ahypermethylation variable target region and/or abnormal fragmentation ofa fragmentation variable target region may be considered indicative oforigination from a tumor cell). This may be done for moleculescorresponding to sequence variable target regions, e.g., moleculescomprising alterations consistent with cancer, such as SNVs, indels,CNVs, and/or fusions. The fraction of tumor DNA may be determined basedon a combination of molecules corresponding to epigenetic target regionsand molecules corresponding to sequence variable target regions.

Determination of a cancer recurrence score may be based at least in parton the fraction of tumor DNA, wherein a fraction of tumor DNA greaterthan a threshold in the range of 10⁻¹¹ to 1 or 10⁻¹⁰ to 1 is sufficientfor the cancer recurrence score to be classified as positive for cancerrecurrence. In some embodiments, a fraction of tumor DNA greater than orequal to a threshold in the range of 10⁻¹⁰ to 10⁻⁹, 10⁻⁹ to 10⁻⁸, 10⁻⁸to 10⁻⁷, 10⁻⁷ to 10⁻⁶, 10⁻⁶ to 10⁻⁵, 10⁻⁵ to 10⁻⁴, 10⁻⁴ to 10⁻³, 10⁻³ to10⁻², or 10⁻² to 10⁻¹ is sufficient for the cancer recurrence score tobe classified as positive for cancer recurrence. In some embodiments,the fraction of tumor DNA greater than a threshold of at least 10⁻⁷ issufficient for the cancer recurrence score to be classified as positivefor cancer recurrence. A determination that a fraction of tumor DNA isgreater than a threshold, such as a threshold corresponding to any ofthe foregoing embodiments, may be made based on a cumulativeprobability. For example, the sample was considered positive if thecumulative probability that the tumor fraction was greater than athreshold in any of the foregoing ranges exceeds a probability thresholdof at least 0.5, 0.75, 0.9, 0.95, 0.98, 0.99, 0.995, or 0.999. In someembodiments, the probability threshold is at least 0.95, such as 0.99.

In some embodiments, the set of sequence information comprisessequence-variable target region sequences and epigenetic target regionsequences, and determining the cancer recurrence score comprisesdetermining a first subscore indicative of the amount of SNVs,insertions/deletions, CNVs and/or fusions present in sequence-variabletarget region sequences and a second subscore indicative of the amountof abnormal molecules in epigenetic target region sequences, andcombining the first and second subscores to provide the cancerrecurrence score. Where the first and second subscores are combined,they may be combined by applying a threshold to each subscoreindependently (e.g., greater than a predetermined number of mutations(e.g., >1) in sequence-variable target regions, and greater than apredetermined fraction of abnormal molecules (i.e., molecules with anepigenetic state different from the DNA found in a corresponding samplefrom a healthy subject; e.g., tumor) in epigenetic target regions), ortraining a machine learning classifier to determine status based on aplurality of positive and negative training samples.

In some embodiments, a value for the combined score in the range of −4to 2 or −3 to 1 is sufficient for the cancer recurrence score to beclassified as positive for cancer recurrence.

In any embodiment where a cancer recurrence score is classified aspositive for cancer recurrence, the cancer recurrence status of thesubject may be at risk for cancer recurrence and/or the subject may beclassified as a candidate for a subsequent cancer treatment.

In some embodiments, the cancer is any one of the types of cancerdescribed elsewhere herein, e.g., colorectal cancer.

3. Therapies and Related Administration

In certain embodiments, the methods disclosed herein relate toidentifying and administering customized therapies to patients given thestatus of a nucleic acid variant as being of somatic or germline origin.In some embodiments, essentially any cancer therapy (e.g., surgicaltherapy, radiation therapy, chemotherapy, and/or the like) may beincluded as part of these methods. Typically, customized therapiesinclude at least one immunotherapy (or an immunotherapeutic agent).Immunotherapy refers generally to methods of enhancing an immuneresponse against a given cancer type. In certain embodiments,immunotherapy refers to methods of enhancing a T cell response against atumor or cancer.

In certain embodiments, the status of a nucleic acid variant from asample from a subject as being of somatic or germline origin may becompared with a database of comparator results from a referencepopulation to identify customized or targeted therapies for thatsubject. Typically, the reference population includes patients with thesame cancer or disease type as the test subject and/or patients who arereceiving, or who have received, the same therapy as the test subject. Acustomized or targeted therapy (or therapies) may be identified when thenucleic variant and the comparator results satisfy certainclassification criteria (e.g., are a substantial or an approximatematch).

In certain embodiments, the customized therapies described herein aretypically administered parenterally (e.g., intravenously orsubcutaneously). Pharmaceutical compositions containing animmunotherapeutic agent are typically administered intravenously.Certain therapeutic agents are administered orally. However, customizedtherapies (e.g., immunotherapeutic agents, etc.) may also beadministered by methods such as, for example, buccal, sublingual,rectal, vaginal, intraurethral, topical, intraocular, intranasal, and/orintraauricular, which administration may include tablets, capsules,granules, aqueous suspensions, gels, sprays, suppositories, salves,ointments, or the like.

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. It is not intendedthat the invention be limited by the specific examples provided withinthe specification. While the invention has been described with referenceto the aforementioned specification, the descriptions and illustrationsof the embodiments herein are not meant to be construed in a limitingsense. Numerous variations, changes, and substitutions will now occur tothose skilled in the art without departing from the invention.Furthermore, it shall be understood that all aspects of the inventionare not limited to the specific depictions, configurations or relativeproportions set forth herein which depend upon a variety of conditionsand variables. It should be understood that various alternatives to theembodiments of the disclosure described herein may be employed inpracticing the invention. It is therefore contemplated that thedisclosure shall also cover any such alternatives, modifications,variations or equivalents. It is intended that the following claimsdefine the scope of the invention and that methods and structures withinthe scope of these claims and their equivalents be covered thereby.

While the foregoing disclosure has been described in some detail by wayof illustration and example for purposes of clarity and understanding,it will be clear to one of ordinary skill in the art from a reading ofthis disclosure that various changes in form and detail can be madewithout departing from the true scope of the disclosure and may bepracticed within the scope of the appended claims. For example, all themethods, systems, computer readable media, and/or component features,steps, elements, or other aspects thereof can be used in variouscombinations.

H. Kits

Also provided are kits comprising the compositions as described herein.The kits can be useful in performing the methods as described herein. Insome embodiments, a kit comprises a first reagent for partitioning asample into a plurality of subsamples as described herein, such as anyof the partitioning reagents described elsewhere herein. In someembodiments, a kit comprises a second reagent for subjecting the firstsubsample to a procedure that affects a first nucleobase in the DNAdifferently from a second nucleobase in the DNA of the first subsample,wherein the first nucleobase is a modified or unmodified nucleobase, thesecond nucleobase is a modified or unmodified nucleobase different fromthe first nucleobase, and the first nucleobase and the second nucleobasehave the same base pairing specificity (e.g., any of the reagentsdescribed elsewhere herein for converting a nucleobase such as cytosineor methylated cytosine to a different nucleobase). The kit may comprisethe first and second reagents and additional elements as discussed belowand/or elsewhere herein.

Kits may further comprise a plurality of oligonucleotide probes thatselectively hybridize to least 5, 6, 7, 8, 9, 10, 20, 30, 40 or allgenes selected from the group consisting of ALK, APC, BRAF, CDKN2A,EGFR, ERBB2, FBXW7, KRAS, MYC, NOTCH1, NRAS, PIK3CA, PTEN, RBI, TP53,MET, AR, ABL1, AKT1, ATM, CDH1, CSFIR, CTNNB1, ERBB4, EZH2, FGFR1,FGFR2, FGFR3, FLT3, GNA11, GNAQ, GNAS, HNF1A, HRAS, IDH1, IDH2, JAK2,JAK3, KDR, KIT, MLH1, MPL, NPM1, PDGFRA, PROC, PTPN11, RET, SMAD4,SMARCB1, SMO, SRC, STK11, VHL, TERT, CCND1, CDK4, CDKN2B, RAF1, BRCA1,CCND2, CDK6, NF1, TP53, ARID 1 A, BRCA2, CCNE1, ESR1, RIT1, GATA3,MAP2K1, RHEB, ROS1, ARAF, MAP2K2, NFE2L2, RHOA, and NTRK1. The numbergenes to which the oligonucleotide probes can selectively hybridize canvary. For example, the number of genes can comprise 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43,44, 45, 46, 47, 48, 49, 50, 51, 52, 53, or 54. The kit can include acontainer that includes the plurality of oligonucleotide probes andinstructions for performing any of the methods described herein.

The oligonucleotide probes can selectively hybridize to exon regions ofthe genes, e.g., of the at least 5 genes. In some cases, theoligonucleotide probes can selectively hybridize to at least 30 exons ofthe genes, e.g., of the at least 5 genes. In some cases, the multipleprobes can selectively hybridize to each of the at least 30 exons. Theprobes that hybridize to each exon can have sequences that overlap withat least 1 other probe. In some embodiments, the oligoprobes canselectively hybridize to non-coding regions of genes disclosed herein,for example, intronic regions of the genes. The oligoprobes can alsoselectively hybridize to regions of genes comprising both exonic andintronic regions of the genes disclosed herein.

Any number of exons can be targeted by the oligonucleotide probes. Forexample, at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70,75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145,150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215,220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285,290, 295, 300, 400, 500, 600, 700, 800, 900, 1,000, or more, exons canbe targeted.

The kit can comprise at least 4, 5, 6, 7, or 8 different libraryadaptors having distinct molecular barcodes and identical samplebarcodes. The library adaptors may not be sequencing adaptors. Forexample, the library adaptors do not include flow cell sequences orsequences that permit the formation of hairpin loops for sequencing. Thedifferent variations and combinations of molecular barcodes and samplebarcodes are described throughout, and are applicable to the kit.Further, in some cases, the adaptors are not sequencing adaptors.Additionally, the adaptors provided with the kit can also comprisesequencing adaptors. A sequencing adaptor can comprise a sequencehybridizing to one or more sequencing primers. A sequencing adaptor canfurther comprise a sequence hybridizing to a solid support, e.g., a flowcell sequence. For example, a sequencing adaptor can be a flow celladaptor. The sequencing adaptors can be attached to one or both ends ofa polynucleotide fragment. In some cases, the kit can comprise at least8 different library adaptors having distinct molecular barcodes andidentical sample barcodes. The library adaptors may not be sequencingadaptors. The kit can further include a sequencing adaptor having afirst sequence that selectively hybridizes to the library adaptors and asecond sequence that selectively hybridizes to a flow cell sequence. Inanother example, a sequencing adaptor can be hairpin shaped. Forexample, the hairpin shaped adaptor can comprise a complementary doublestranded portion and a loop portion, where the double stranded portioncan be attached {e.g., ligated) to a double-stranded polynucleotide.Hairpin shaped sequencing adaptors can be attached to both ends of apolynucleotide fragment to generate a circular molecule, which can besequenced multiple times. A sequencing adaptor can be up to 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48,49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, or morebases from end to end. The sequencing adaptor can comprise 20-30, 20-40,30-50, 30-60, 40-60, 40-70, 50-60, 50-70, bases from end to end. In aparticular example, the sequencing adaptor can comprise 20-30 bases fromend to end. In another example, the sequencing adaptor can comprise50-60 bases from end to end. A sequencing adaptor can comprise one ormore barcodes. For example, a sequencing adaptor can comprise a samplebarcode. The sample barcode can comprise a pre-determined sequence. Thesample barcodes can be used to identify the source of thepolynucleotides. The sample barcode can be at least 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ormore (or any length as described throughout) nucleic acid bases, e.g.,at least 8 bases. The barcode can be contiguous or non-contiguoussequences, as described above.

The library adaptors can be blunt ended and Y-shaped and can be lessthan or equal to 40 nucleic acid bases in length. Other variations ofthe can be found throughout and are applicable to the kit.

III. EXAMPLES

The following examples are provided to illustrate certain aspects of thedisclosed methods. The examples do not limit the disclosure.

Example 1: Analysis of cfDNA to Detect the Presence/Absence of Tumor

A set of patient samples are analyzed by a blood-based NGS assay atGuardant Health (Redwood City, Calif., USA) to detect thepresence/absence of cancer. cfDNA is extracted from the plasma of thesepatients. cfDNA of the patient samples is then combined with methylbinding domain (MBD) buffers and magnetic beads conjugated with an MBDprotein and incubated overnight. Methylated cfDNA (if present, in thecfDNA sample) is bound to the MBD protein during this incubation.Non-methylated or less methylated DNA is washed away from the beads withbuffers containing increasing concentrations of salt. Finally, a highsalt buffer is used to wash the heavily methylated DNA away from the MBDprotein. These washes result in three partitions (hypomethylated,residual methylation and hypermethylated partitions) of increasinglymethylated cfDNA.

Optionally, the cfDNA molecules in the hypermethylated partition aresubjected to enzymatic modification (EM) with whereby unmodifiedcytosines, but not mC and hmC, undergo deamination, thereby markingnonspecifically partitioned hypomethylated molecules in the firstsubsample by conversion of unmodified cytosines to uracils.

After concentrating the cfDNA in the partitions, the end overhangs ofthe partitioned cfDNA are extended, and adenosine residues are added tothe 3′ ends of the cfDNA fragment by the polymerase during theextension. The 5′ end of each fragment is phosphorylated. Thesemodifications make the partitioned cfDNA ligatable. DNA ligase andadapters are added to ligate each partitioned cfDNA molecule with anadapter on each end. These adapters contain non-unique molecularbarcodes and each partition is ligated with adapters having non-uniquemolecular barcodes that is distinguishable from the barcodes in theadapters used in the other partitions.

The cfDNA in the hypomethylated partition is contacted with one or moremethylation-dependent nucleases. The nucleases cleave at least a portionof nonspecifically partitioned DNA in the hypomethylated partition.

After ligation, the four partitions are pooled together and areamplified by PCR. Molecules that were cleaved by the one or moremethylation-dependent nucleases do not undergo exponential amplificationbecause they do not have an adapter on each end.

Following PCR, amplified DNA is washed and concentrated prior toenrichment. Once concentrated, the amplified DNA is combined with a saltbuffer and biotinylated RNA probes that comprise probes for asequence-variable target region set and probes for an epigenetic targetregion set and this mixture is incubated overnight. The probes for thesequence-variable region set has a footprint of about 50 kb and theprobes for the epigenetic target region set has a footprint of about 500kb. The probes for the sequence-variable target region set compriseoligonucleotides targeting at least a subset of genes identified inTables 3-5 and the probes for the epigenetic target region set comprisesoligonucleotides targeting a selection of hypermethylation variabletarget regions, hypomethylation variable target regions, CTCF bindingtarget regions, transcription start site target regions, focalamplification target regions and methylation control regions.

The biotinylated RNA probes (hybridized to DNA) are captured bystreptavidin magnetic beads and separated from the amplified DNA thatare not captured by a series of salt based washes, thereby enriching thesample. After enrichment, an aliquot of the enriched sample is sequencedusing Illumina NovaSeq sequencer. The sequence reads generated by thesequencer are then analyzed using bioinformatic tools/algorithms. Themolecular barcodes are used to identify unique molecules as well as fordeconvolution of the sample into molecules that were differentiallyMBD-partitioned. The method described in this example, apart fromproviding information on the overall level methylation (i.e., methylatedcytosine residues) of a molecule based on its partition, including withincreased accuracy and/or confidence due to the cleavage ofnonspecifically partitioned cfDNA in the hypomethylated partition, canalso provide a higher resolution information about the location ofmethylated cytosines based on the conversion of unmethylated cytosinesin the hypermethylated partition. The sequence-variable target regionsequences are analyzed by detecting genomic alterations such as SNVs,insertions, deletions and fusions that can be called with enough supportthat differentiates real tumor variants from technical errors (for e.g.,PCR errors, sequencing errors). The epigenetic target region sequencesare analyzed independently to detect methylation status of cfDNAmolecules in regions that have been shown to be differentiallymethylated, e.g., in potentially cancerous tissue compared to healthycfDNA. Finally, the results of both analysis are combined to produce afinal tumor present/absent call.

Example 2: Analysis of Methylation at Single Nucleotide Resolution incfDNA Samples from Healthy Subjects and Subjects with Early-StageColorectal Cancer

Samples of cfDNA from healthy subjects and subjects with early-stagecolorectal cancer were analyzed as follows. cfDNA was partitioned usingMBD to provide a hypermethylated partition, an intermediate partition,and a hypomethylated partition. The partitioned DNA of each partitionwas ligated to adapters and subjected to an EM-seq conversion procedurewhereby unmodified cytosines, but not mC and hmC, undergo deamination,although in an alternative procedure the partitioned DNA of thehypomethylated partition could be contacted with a methylation-dependentnuclease as described herein. Following such deamination, the partitionswere prepared for sequencing and subjected to whole-genome sequencing.Each partition was sequenced separately, although in an alternativeprocedure the partitions could be differentially tagged (e.g., afterpartitioning and before EM-seq conversion, or after partitioning andEM-seq conversion and before further preparation for sequencing),pooled, and processed sequenced in parallel.

Sequence data from hypermethylation variable target regions was isolatedbioinformatically, although in an alternative procedure target regionscould be enriched in vitro before sequencing. Per-base methylation forthe hypermethylation variable target regions was quantified as shown inFIG. 7, which shows the number of methylated CpG per molecule in thehypermethylation variable target regions from the hypermethylatedpartition. The x-axis indicates the total number of CpGs per molecule,such that points along the diagonal represent molecules with methylationat every CpG. Thus, it was possible to analyze methylation atsingle-base resolution and quantify per base methylation and partialmolecule methylation of the MBD-partitioned material. The samples fromsubjects with colorectal cancer exhibited much higher overallmethylation in these regions than samples from healthy subjects.

Example 3: Reduction of Technical Noise by Digestion of NonspecificallyPartitioned DNA

A pool of cfDNA from two healthy normal samples was combined, from which18.6 ng was used as input to a MBD-partitioning assay described herein.To a subset of the samples, cfDNA from a colorectal cancer sample (CRC)with 0.5% MAF (mutant allele fraction) was added, resulting in a dilutedCRC sample with 0.16% MAF. Three sets of normal samples and diluted CRCsamples were used in the assay. The three sets of samples were thenpartitioned using MBD protein into three partitions (hyper, residual andhypo partitions). Following cleanup, the cfDNA molecules in eachpartition was ligated with partition-specific adapters comprisingmolecular barcodes. The molecular barcodes use in hyper and residualpartition are selected such that they do not have MSRE recognitionsites, so they are not digested in the downstream processing(irrespective of cfDNA methylation state). Post-ligation, ligationcleanups were performed. Following the ligation cleanup, the hyper andresidual partitions were subjected to MSRE digestion reactions. A firstset of the samples (normal and diluted CRC samples) were treated withBstUI and HpaII and another set of the samples were treated BstUI, HpaIIand Hin6I enzymes. The third set of samples were run through a mockdigest (no MSREs) in the MBD-partitioning assay as a control. After theMSRE digestion, the enzymes were heat inactivated (65C, 20 min) andcleaned up using SPRI beads. After the digest cleanups, the hyper,residual and (non-digested) hypo partitions (adapter-ligated cfDNA) werecombined and processed through an NGS assay workflow comprising PCRamplification; enrichment of molecules in genomic regions of interest;pooling of samples thereby allowing multiplexed sequencing andsequencing the pooled sample using NovaSeq. In an alternative procedurethe hypo partition may be contacted with one or moremethylation-dependent nucleases such as any of the MDREs describedherein to cleave nonspecifically partitioned DNA in the hypo partition.

FIG. 6 clearly shows the increase in cancer methylation signal at DMRsrelative to the technical noise from unmethylated molecules in normalsamples when the MSRE digestion was applied. In the negative controlregions (where the DNA molecules are non-methylated at almost all timesirrespective of the disease state) shown in FIG. 6, “a” clearlyindicates that it was clear that the MSRE digestion removes theunmethylated molecules that mis-partitioned into the hyper partition—90molecules were partitioned into hyper partition in the mock digestionwhereas in BstUI, HpaII and Hin6I digestion the molecule count wasreduced to 10. In the classification DMRs shown in FIG. 6, cfDNAmolecules were removed by much higher proportion in normal samples (b;350→100) than diluted CRC samples (c; 1500→1100) upon digestion withMSREs.

Example 4: Analysis of MDRE-Digested cfDNA

Multiple aliquots of cfDNA from two healthy donors were isolated andsubjected to MBD-based partitioning of methylated cfDNA. Thehypomethylated cfDNA partition was then subjected to ligation of NGSadapters onto the cfDNA molecules. Ligated cfDNA from each donor wasthen subjected to methylation-dependent restriction enzyme (MDRE)digestion with FspEI, LpnPI, MspJI, or SgeI, or a ‘mock’ digestion (noenzyme added to digestion) or an undigested condition in which the MDREreaction was skipped as control reactions. After the MDRE step, thehypomethylated cfDNA partition was amplified in a universal PCR in whichDNA that had been cleaved by the MDRE was not exponentially amplifiedbecause adapters were not present at each end. The PCR products werethen subjected to enrichment of targeted genomic regions using a hybridcapture panel, amplified in a second PCR, and sequenced by NGS. Thehybrid capture panel targets include ‘positive control (ctrl)’ and‘negative control (ctrl)’ regions of the genome for enrichment. Positivecontrol regions are CpG-dense regions of genome that are found to beubiquitously highly methylated (>85% methylation by bisulfite-seq) inall human tissues including blood and cancerous tissue. Conversely,negative control regions are ubiquitously unmethylated (<15%methylation) in all human tissues. From the NGS analysis, the number ofpositive control molecules (i.e., molecules in the positive controlregions) and negative control molecules (i.e., molecules in the negativecontrol regions) sequenced in all the conditions are compared toestimate MDRE sensitivity and specificity, respectively. FIGS. 8A-B showthat the FspEI enzyme treatment reduced the number of positive controlmolecules >100-fold compared to the ‘mock’ condition, demonstrating ˜99%sensitivity with respect to digestion of methylated molecules. FIGS.8C-D show that the FspEI treatment does not meaningfully reduce thenegative control molecules, indicating high specificity with FspEIdigestion (does not digest unmethylated molecules). Note that MspJIshows some sensitivity, but poor specificity compared to FspEI, whileLpnI and SgeI show little/no sensitivity.

The MDRE digestion efficiency was calculated using molecules withdifferent recognition sites and number of sites per molecule. Digestionefficiency is calculated as 1-[number of positive control molecules inMDRE condition]/[number of positive control molecules in the mockcondition]. The general recognition sequence of FspEI that includes a^(5m)CpG is C^(5m)CGH (H=A, C, or T), with cleavage occurring 12-16bases downstream. The FspEI palindromic site C^(5m)CGG contains twoFspEI recognition sites—on the top and bottom strands in oppositedirections. The general ^(5m)CpG-containing consensus is ^(5m)CpGNR,which can overlap with the FspEI consensus. FIGS. 9A-D show thatdigestion efficiency increases with the minimum number of C^(5m)CGH orC^(5m)CGG sites per molecule and is more efficient at the palindromicsite (C^(5m)CGG). Positive control molecules with at least one C^(5m)CGGor at least two C^(5m)CGH sites were cleaved with 95% efficiency.

Additionally, digestion with FspEI and MspJI simultaneously orsequentially was tested. Sequential digestion with the two MDREs (FspEIthen MspJI) had the highest efficiency. It is possible that in thesimultaneous digestion (FspEI and MspJI), MspJI sometimes binds to theDNA but does not cleave (lower individual efficiency), thus stericallyblocking the FspEI activity. Although FspEI then MspJI has higheroverall efficiency than FspEI alone here, FspEI alone has bettercleavage specificity. Thus, in different circumstances, digestion withFspEI alone or with FspEI then MspJI may be preferable. Note that withhigher numbers of minimum sites there are fewer positive controlmolecules observed (FIGS. 9C-D) and thus the digestion efficiencyestimate becomes more noisy.

Example 5: Detection of Tumor DNA Following MDRE Treatment

cfDNA isolated from four healthy donors was used to create ‘normal’ andsimulated ‘cancer’ cfDNA samples. The donor samples were used neat as‘normals’ and spiked with the cfDNA of a colorectal cancer (CRC) patientto create a ‘cancer’ sample. The circulating tumor DNA fraction of theCRC cfDNA sample had been previously measured and was used to spike acalculated amount of CRC cfDNA into the normal donor cfDNA such that theresulting ‘cancer’ sample contained 0.5% circulating tumor DNA (“0.5%CRC” in FIGS. 10A-J). All the samples were subjected to MBD-basedpartitioning, splitting the cfDNA into hypermethylated andhypomethylated cfDNA partitions. The hypomethylated cfDNA partition wasthen ligated to NGS adapters. Ligated cfDNA from each donor was thensubjected to a MDRE digestion with either FspEI, MspJI or FspEI+MspJI. A‘mock digestion’ (no enzyme added to digestion reaction) and ‘nodigestion’ condition (skip MDRE reaction altogether) served as controlreactions. After the MDRE step, the non-digested hypomethylatedpartition cfDNA was amplified in a universal PCR, then subjected toenrichment of targeted genomic regions using a hybrid capture panel, andthen amplified in a 2nd PCR and sequenced by NGS. The hybrid capturepanel targets include hypomethylation variable target regions and‘negative control (ctrl)’ regions of the genome for enrichment. Negativecontrol regions are CpG-dense regions of genome that are found to beubiquitously lowly methylated (<15% methylation by bisulfite-seq) in allhuman tissues including blood and cancerous tissue. The hypomethylationvariable target regions are genomic regions annotated in literature ashaving reduced methylation percentage in CRC tissue compared to healthycolon tissue and blood. From the NGS analysis, the number ofhypomethylation variable target region molecules with 2 CCGG sites ormore (which should be digested with high efficiency by the MDRE) iscompared between ‘normal’ and ‘cancer’ samples across all the digestionconditions (FIGS. 10A-E). The ratios of the hypomethylation variabletarget region molecule counts were also compared to the negative controlmolecule counts, which normalizes for varying cfDNA input amounts thatcan affect the hypomethylation variable target region molecule counts(FIGS. 10F-J). No resolvable detection of the hypomethylation variabletarget region cancer signals was observed in the no MDRE digestionconditions (‘no digestion’ and ‘mock digestion’). That is, thehypomethylation variable target region molecules and the normalizedratio levels were indistinguishable (not significantly different)between the ‘cancer’ and ‘normal’ samples (this is marked by thehorizontal arrows in FIGS. 10C, E, H, and J). Conversely, when there wasan MDRE treatment, a shift (increase) in the hypomethylation variabletarget region counts and normalized ratio was detected in the ‘cancer’as compared to the ‘normal’ samples (marked by upward right arrow inFIGS. 10A, B, D, F, G, and I). Thus, the MDRE treatment enablesdetection of a cancer hypomethylation variable target region signal inthe ‘cancer’ samples at 0.5% CRC ctDNA, that are not detectable by theMBD-partitioning assay alone.

While preferred embodiments of the present invention have been shown anddescribed herein, it will be obvious to those skilled in the art thatsuch embodiments are provided by way of example only. It is not intendedthat the invention be limited by the specific examples provided withinthe specification. While the invention has been described with referenceto the aforementioned specification, the descriptions and illustrationsof the embodiments herein are not meant to be construed in a limitingsense. Numerous variations, changes, and substitutions will now occur tothose skilled in the art without departing from the invention.Furthermore, it shall be understood that all aspects of the inventionare not limited to the specific depictions, configurations or relativeproportions set forth herein which depend upon a variety of conditionsand variables. It should be understood that various alternatives to theembodiments of the disclosure described herein may be employed inpracticing the invention. It is therefore contemplated that thedisclosure shall also cover any such alternatives, modifications,variations or equivalents. It is intended that the following claimsdefine the scope of the invention and that methods and structures withinthe scope of these claims and their equivalents be covered thereby.

While the foregoing disclosure has been described in some detail by wayof illustration and example for purposes of clarity and understanding,it will be clear to one of ordinary skill in the art from a reading ofthis disclosure that various changes in form and detail can be madewithout departing from the true scope of the disclosure and may bepracticed within the scope of the appended claims. For example, all themethods, systems, computer readable media, and/or component features,steps, elements, or other aspects thereof can be used in variouscombinations.

All patents, patent applications, websites, other publications ordocuments, accession numbers and the like cited herein are incorporatedby reference in their entirety for all purposes to the same extent as ifeach individual item were specifically and individually indicated to beso incorporated by reference. If different versions of a sequence areassociated with an accession number at different times, the versionassociated with the accession number at the effective filing date ofthis application is meant. The effective filing date means the earlierof the actual filing date or filing date of a priority applicationreferring to the accession number, if applicable. Likewise, if differentversions of a publication, website or the like are published atdifferent times, the version most recently published at the effectivefiling date of the application is meant, unless otherwise indicated.

1. A method of analyzing DNA in a sample, the method comprising: a)partitioning the sample into a plurality of subsamples, including afirst subsample and a second subsample, wherein the first subsamplecomprises DNA with a cytosine modification in a greater proportion thanthe second subsample; b) contacting the second subsample with amethylation-dependent nuclease, thereby degrading nonspecificallypartitioned DNA in the second subsample to produce a treated secondsubsample and optionally contacting the first subsample with amethylation-sensitive endonuclease, thereby degrading nonspecificallypartitioned DNA in the first subsample to produce a treated firstsubsample; and c) capturing a first target region set comprisingepigenetic target regions from at least a portion of the first subsampleor the treated first subsample, and capturing a second target region setcomprising epigenetic target regions from at least a portion of thetreated second subsample.
 2. (canceled)
 3. The method of claim 1,further comprising quantifying epigenetic target regions captured fromor present in one or more of the first subsample, the treated firstsubsample, or the treated second subsample.
 4. (canceled)
 5. The methodof claim 1, further comprising sequencing DNA in the first target regionset and the second target region set or in the treated second subsample.6. The method of claim 5, wherein DNA in the treated second subsampleand DNA in the treated first subsample is sequenced.
 7. The method ofclaim 1, wherein the epigenetic target regions comprise ahypomethylation variable target region set.
 8. The method of claim 7,wherein the hypomethylation variable target region set comprises regionshaving a lower degree of methylation in at least one type of tissue thanthe degree of methylation in cell-free DNA from a healthy subject. 9.The method of claim 8, wherein the method further comprises determininga presence, absence, or likelihood of cancer based at least in part onsequences or quantities of regions in the hypomethylation variabletarget region set.
 10. The method of claim 7, further comprisingquantifying tumor DNA in the sample based at least in part on sequencesor quantities of regions in the hypomethylation variable target regionset. 11-15. (canceled)
 16. The method of claim 1, wherein the DNAcomprises cell-free DNA (cfDNA) obtained from a test subject.
 17. Themethod of claim 1, wherein the cytosine modification is methylation. 18.The method of claim 1, wherein the cytosine modification is methylationat the 5 position of cytosine.
 19. The method of claim 1, wherein thefirst subsample is contacted with a methylation-sensitive endonucleaseand the methylation-sensitive endonuclease cleaves an unmethylated CpGsequence.
 20. (canceled)
 21. (canceled)
 22. (canceled)
 23. The method ofclaim 1, wherein the methylation-dependent endonuclease cleaves amethylated CpG sequence.
 24. The method of claim 1, wherein themethylation-dependent endonuclease is one or more of MspJI, LpnPI,FspEI, or McrBC.
 25. The method of claim 1, wherein the first targetregion set comprises a hypermethylation variable target region set thatcomprises regions having a higher degree of methylation in at least onetype of tissue than the degree of methylation in cell-free DNA from ahealthy subject.
 26. (canceled)
 27. The method of claim 25, wherein themethod further comprises determining a presence, absence, or likelihoodof cancer based at least in part on sequences or quantities of regionsin the hypermethylation variable target region set.
 28. (canceled) 29.(canceled)
 30. The method of claim 1, wherein the first and/or secondepigenetic target region set comprise a fragmentation variable targetregion set.
 31. The method of claim 30, wherein the fragmentationvariable target region set comprises transcription start site regionsand/or CTCF binding regions.
 32. (canceled)
 33. The method of claim 1,wherein the first target region set and/or second target region setfurther comprises sequence-variable target regions. 34-47. (canceled)48. The method of claim 1, comprising differentially tagging the firstsubsample and second subsample, the treated first subsample and thesecond subsample, the first subsample and the treated second subsample,or the treated first subsample and the treated second subsample, whereinthe treated first subsample and the second subsample, the firstsubsample and the treated second subsample, or the treated firstsubsample and the treated second subsample are pooled after contactingthe first subsample with the methylation-sensitive endonuclease and/orcontacting the second subsample with a methylation-dependent nuclease.49. (canceled)
 50. (canceled)
 51. The method of claim 1, wherein theplurality of subsamples comprises a third subsample, which comprises DNAwith a cytosine modification in a greater proportion than the secondsubsample but in a lesser proportion than the first subsample.
 52. Themethod of claim 51, wherein the method further comprises differentiallytagging the third subsample, and the first, second, and third subsamplesare combined after contacting the first subsample with themethylation-sensitive endonuclease and/or contacting the secondsubsample with a methylation-dependent nuclease, optionally wherein DNAfrom the first, second, and third subsamples is sequenced in the samesequencing cell.
 53. (canceled)
 54. The method of claim 51, wherein thethird subsample is contacted with a methylation-sensitive endonuclease.55. The method of claim 52, wherein the third subsample is combined withthe first subsample, and the combined first and third subsamples arecontacted with a methylation-sensitive endonuclease. 56-90. (canceled)